Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.15857
Cited By
A Survey on Image-text Multimodal Models
23 September 2023
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai-Nguyen Nguyen
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Survey on Image-text Multimodal Models"
8 / 108 papers shown
Title
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
334
10,067
0
10 Feb 2015
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
Junhua Mao
Wenyuan Xu
Yi Yang
Jiang Wang
Zhiheng Huang
Alan Yuille
VLM
168
1,240
0
20 Dec 2014
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
Subhashini Venugopalan
Huijuan Xu
Jeff Donahue
Marcus Rohrbach
Raymond J. Mooney
Kate Saenko
132
952
0
15 Dec 2014
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Junyoung Chung
Çağlar Gülçehre
Kyunghyun Cho
Yoshua Bengio
577
12,702
0
11 Dec 2014
Deep Visual-Semantic Alignments for Generating Image Descriptions
A. Karpathy
Li Fei-Fei
122
5,584
0
07 Dec 2014
Show and Tell: A Neural Image Caption Generator
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
3DV
235
6,026
0
17 Nov 2014
Explain Images with Multimodal Recurrent Neural Networks
Junhua Mao
Wenyuan Xu
Yi Yang
Jiang Wang
Alan Yuille
VLM
GAN
98
385
0
04 Oct 2014
Going Deeper with Convolutions
Christian Szegedy
Wei Liu
Yangqing Jia
P. Sermanet
Scott E. Reed
Dragomir Anguelov
D. Erhan
Vincent Vanhoucke
Andrew Rabinovich
452
43,649
0
17 Sep 2014
Previous
1
2
3