Instance-aware Image and Sentence Matching with Selective Multimodal LSTM

17 November 2016

Liang Wang

Papers citing "Instance-aware Image and Sentence Matching with Selective Multimodal LSTM"

21 / 21 papers shown

Title
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval Guanqi Zhan Yuanpei Liu Kai Han Weidi Xie Andrew Zisserman VLM 518 0 0 21 Feb 2025
Contextual LSTM (CLSTM) models for Large scale NLP tasks Shalini Ghosh Oriol Vinyals B. Strope Scott Roy Tom Dean Larry Heck 70 213 0 19 Feb 2016
RNN Fisher Vectors for Action Recognition and Image Annotation Guy Lev Gil Sadeh Benjamin Klein Lior Wolf 55 164 0 12 Dec 2015
Order-Embeddings of Images and Language Ivan Vendrov Ryan Kiros Sanja Fidler R. Urtasun 120 548 0 19 Nov 2015
Learning Deep Structure-Preserving Image-Text Embeddings Liwei Wang Yin Li Svetlana Lazebnik 86 783 0 19 Nov 2015
Skip-Thought Vectors Ryan Kiros Yukun Zhu Ruslan Salakhutdinov R. Zemel Antonio Torralba R. Urtasun Sanja Fidler SSL 228 2,412 0 22 Jun 2015
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models Bryan A. Plummer Liwei Wang Christopher M. Cervantes Juan C. Caicedo Julia Hockenmaier Svetlana Lazebnik 222 2,074 0 19 May 2015
Multimodal Convolutional Neural Networks for Matching Image and Sentence Lin Ma Zhengdong Lu Lifeng Shang Hang Li 121 337 0 23 Apr 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Ke Xu Jimmy Ba Ryan Kiros Kyunghyun Cho Aaron Courville Ruslan Salakhutdinov R. Zemel Yoshua Bengio DiffM 352 10,091 0 10 Feb 2015
Deep Visual-Semantic Alignments for Generating Image Descriptions A. Karpathy Li Fei-Fei 154 5,599 0 07 Dec 2014
From Captions to Visual Concepts and Back Hao Fang Saurabh Gupta F. Iandola R. Srivastava Li Deng ... Xiaodong He Margaret Mitchell John C. Platt C. L. Zitnick Geoffrey Zweig VLM 134 1,312 0 18 Nov 2014
Show and Tell: A Neural Image Caption Generator Oriol Vinyals Alexander Toshev Samy Bengio D. Erhan 3DV 270 6,042 0 17 Nov 2014
Long-term Recurrent Convolutional Networks for Visual Recognition and Description Jeff Donahue Lisa Anne Hendricks Marcus Rohrbach Subhashini Venugopalan S. Guadarrama Kate Saenko Trevor Darrell VLM 173 6,060 0 17 Nov 2014
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models Ryan Kiros Ruslan Salakhutdinov R. Zemel VLM 135 1,401 0 10 Nov 2014
Explain Images with Multimodal Recurrent Neural Networks Junhua Mao Wenyuan Xu Yi Yang Jiang Wang Alan Yuille VLM GAN 118 385 0 04 Oct 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan Andrew Zisserman FAtt MDE 1.7K 100,575 0 04 Sep 2014
Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau Kyunghyun Cho Yoshua Bengio AIMat 589 27,345 0 01 Sep 2014
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping A. Karpathy Armand Joulin Li Fei-Fei VLM 116 937 0 22 Jun 2014
Microsoft COCO: Common Objects in Context Nayeon Lee Michael Maire Serge J. Belongie Lubomir Bourdev Ross B. Girshick James Hays Pietro Perona Deva Ramanan C. L. Zitnick Piotr Dollár ObjD 444 43,875 0 01 May 2014
Rich feature hierarchies for accurate object detection and semantic segmentation Ross B. Girshick Jeff Donahue Trevor Darrell Jitendra Malik ObjD 311 26,247 0 11 Nov 2013
Efficient Estimation of Word Representations in Vector Space Tomas Mikolov Kai Chen G. Corrado J. Dean 3DV 712 31,571 0 16 Jan 2013