A Survey on Image-text Multimodal Models

A Survey on Image-text Multimodal Models

23 September 2023

Ruifeng Guo

Linzhuang Sun

Khai-Nguyen Nguyen

Guiyong Chang

Dawei Liu

Sibo Zhang

Zhengbing Yao

Mingjun Xu

Liping Bu

Papers citing "A Survey on Image-text Multimodal Models"

8 / 108 papers shown

Title
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Ke Xu Jimmy Ba Ryan Kiros Kyunghyun Cho Aaron Courville Ruslan Salakhutdinov R. Zemel Yoshua Bengio DiffM 334 10,067 0 10 Feb 2015
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) Junhua Mao Wenyuan Xu Yi Yang Jiang Wang Zhiheng Huang Alan Yuille VLM 168 1,240 0 20 Dec 2014
Translating Videos to Natural Language Using Deep Recurrent Neural Networks Subhashini Venugopalan Huijuan Xu Jeff Donahue Marcus Rohrbach Raymond J. Mooney Kate Saenko 132 952 0 15 Dec 2014
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Junyoung Chung Çağlar Gülçehre Kyunghyun Cho Yoshua Bengio 577 12,702 0 11 Dec 2014
Deep Visual-Semantic Alignments for Generating Image Descriptions A. Karpathy Li Fei-Fei 122 5,584 0 07 Dec 2014
Show and Tell: A Neural Image Caption Generator Oriol Vinyals Alexander Toshev Samy Bengio D. Erhan 3DV 235 6,026 0 17 Nov 2014
Explain Images with Multimodal Recurrent Neural Networks Junhua Mao Wenyuan Xu Yi Yang Jiang Wang Alan Yuille VLM GAN 98 385 0 04 Oct 2014
Going Deeper with Convolutions Christian Szegedy Wei Liu Yangqing Jia P. Sermanet Scott E. Reed Dragomir Anguelov D. Erhan Vincent Vanhoucke Andrew Rabinovich 452 43,649 0 17 Sep 2014