v1v2v3 (latest)

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

22 July 2019

Papers citing "Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods"

44 / 294 papers shown

Title
Grounding of Textual Phrases in Images by Reconstruction Anna Rohrbach Marcus Rohrbach Ronghang Hu Trevor Darrell Bernt Schiele 80 497 0 12 Nov 2015
Visual7W: Grounded Question Answering in Images Yuke Zhu Oliver Groth Michael S. Bernstein Li Fei-Fei 104 887 0 11 Nov 2015
Neural Module Networks Jacob Andreas Marcus Rohrbach Trevor Darrell Dan Klein CoGe 139 1,076 0 09 Nov 2015
Generating Images from Captions with Attention Elman Mansimov Emilio Parisotto Jimmy Lei Ba Ruslan Salakhutdinov VLM 84 457 0 09 Nov 2015
Generation and Comprehension of Unambiguous Object Descriptions Junhua Mao Jonathan Huang Alexander Toshev Oana-Maria Camburu Alan Yuille Kevin Patrick Murphy ObjD 131 1,357 0 07 Nov 2015
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks Haonan Yu Jiang Wang Zhiheng Huang Yi Yang Wenyuan Xu 90 560 0 26 Oct 2015
SentiCap: Generating Image Descriptions with Sentiments A. Mathews Lexing Xie Xuming He 87 222 0 06 Oct 2015
A large annotated corpus for learning natural language inference Samuel R. Bowman Gabor Angeli Christopher Potts Christopher D. Manning 324 4,294 0 21 Aug 2015
A Survey of Current Datasets for Vision and Language Research Francis Ferraro N. Mostafazadeh Ting-Hao 'Kenneth' Huang Huang Lucy Vanderwende Jacob Devlin Michel Galley Margaret Mitchell VLM 67 75 0 23 Jun 2015
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books Yukun Zhu Ryan Kiros R. Zemel Ruslan Salakhutdinov R. Urtasun Antonio Torralba Sanja Fidler 133 2,554 0 22 Jun 2015
Aligning where to see and what to tell: image caption with region-based attention and scene factorization Junqi Jin Kun Fu Runpeng Cui Fei Sha Changshui Zhang 72 117 0 20 Jun 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren Kaiming He Ross B. Girshick Jian Sun AIMat ObjD 528 62,377 0 04 Jun 2015
Visual Madlibs: Fill in the blank Image Generation and Question Answering Licheng Yu Eunbyung Park Alexander C. Berg Tamara L. Berg VLM MLLM 102 98 0 31 May 2015
Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering Haoyuan Gao Junhua Mao Jie Zhou Zhiheng Huang Lei Wang Wenyuan Xu 78 501 0 21 May 2015
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models Bryan A. Plummer Liwei Wang Christopher M. Cervantes Juan C. Caicedo Julia Hockenmaier Svetlana Lazebnik 208 2,072 0 19 May 2015
Exploring Models and Data for Image Question Answering Mengye Ren Ryan Kiros R. Zemel 80 718 0 08 May 2015
Ask Your Neurons: A Neural-based Approach to Answering Questions about Images Mateusz Malinowski Marcus Rohrbach Mario Fritz 108 600 0 05 May 2015
Sequence to Sequence -- Video to Text Subhashini Venugopalan Marcus Rohrbach Jeff Donahue Raymond J. Mooney Trevor Darrell Kate Saenko 142 1,419 0 03 May 2015
VQA: Visual Question Answering Aishwarya Agrawal Jiasen Lu Stanislaw Antol Margaret Mitchell C. L. Zitnick Dhruv Batra Devi Parikh CoGe 226 5,503 0 03 May 2015
Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research Atousa Torabi C. Pal Hugo Larochelle Aaron Courville VGen 101 205 0 03 Mar 2015
Unsupervised Learning of Video Representations using LSTMs Nitish Srivastava Elman Mansimov Ruslan Salakhutdinov SSL 142 2,593 0 16 Feb 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Ke Xu Jimmy Ba Ryan Kiros Kyunghyun Cho Aaron Courville Ruslan Salakhutdinov R. Zemel Yoshua Bengio DiffM 348 10,079 0 10 Feb 2015
A Dataset for Movie Description Anna Rohrbach Marcus Rohrbach Niket Tandon Bernt Schiele VGen 122 502 0 12 Jan 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 2.0K 150,312 0 22 Dec 2014
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) Junhua Mao Wenyuan Xu Yi Yang Jiang Wang Zhiheng Huang Alan Yuille VLM 178 1,240 0 20 Dec 2014
Translating Videos to Natural Language Using Deep Recurrent Neural Networks Subhashini Venugopalan Huijuan Xu Jeff Donahue Marcus Rohrbach Raymond J. Mooney Kate Saenko 140 952 0 15 Dec 2014
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Junyoung Chung Çağlar Gülçehre Kyunghyun Cho Yoshua Bengio 601 12,741 0 11 Dec 2014
Deep Visual-Semantic Alignments for Generating Image Descriptions A. Karpathy Li Fei-Fei 144 5,591 0 07 Dec 2014
Learning Spatiotemporal Features with 3D Convolutional Networks Du Tran Lubomir D. Bourdev Rob Fergus Lorenzo Torresani Manohar Paluri 3DPC 77 411 0 02 Dec 2014
CIDEr: Consensus-based Image Description Evaluation Ramakrishna Vedantam C. L. Zitnick Devi Parikh 297 4,508 0 20 Nov 2014
From Captions to Visual Concepts and Back Hao Fang Saurabh Gupta F. Iandola R. Srivastava Li Deng ... Xiaodong He Margaret Mitchell John C. Platt C. L. Zitnick Geoffrey Zweig VLM 119 1,312 0 18 Nov 2014
Show and Tell: A Neural Image Caption Generator Oriol Vinyals Alexander Toshev Samy Bengio D. Erhan 3DV 260 6,035 0 17 Nov 2014
Long-term Recurrent Convolutional Networks for Visual Recognition and Description Jeff Donahue Lisa Anne Hendricks Marcus Rohrbach Subhashini Venugopalan S. Guadarrama Kate Saenko Trevor Darrell VLM 165 6,056 0 17 Nov 2014
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models Ryan Kiros Ruslan Salakhutdinov R. Zemel VLM 127 1,401 0 10 Nov 2014
A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input Mateusz Malinowski Mario Fritz 220 698 0 01 Oct 2014
Going Deeper with Convolutions Christian Szegedy Wei Liu Yangqing Jia P. Sermanet Scott E. Reed Dragomir Anguelov D. Erhan Vincent Vanhoucke Andrew Rabinovich 485 43,694 0 17 Sep 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan Andrew Zisserman FAtt MDE 1.7K 100,508 0 04 Sep 2014
Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau Kyunghyun Cho Yoshua Bengio AIMat 578 27,327 0 01 Sep 2014
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation Kyunghyun Cho B. V. Merrienboer Çağlar Gülçehre Dzmitry Bahdanau Fethi Bougares Holger Schwenk Yoshua Bengio AIMat 1.1K 23,388 0 03 Jun 2014
Microsoft COCO: Common Objects in Context Nayeon Lee Michael Maire Serge J. Belongie Lubomir Bourdev Ross B. Girshick James Hays Pietro Perona Deva Ramanan C. L. Zitnick Piotr Dollár ObjD 424 43,814 0 01 May 2014
Coherent Multi-Sentence Video Description with Variable Level of Detail Anna Rohrbach Marcus Rohrbach Weijian Qiu Annemarie Friedrich Sikandar Amin Mykhaylo Andriluka Manfred Pinkal Bernt Schiele 91 218 0 24 Mar 2014
Auto-Encoding Variational Bayes Diederik P. Kingma Max Welling BDL 455 16,923 0 20 Dec 2013
Distributed Representations of Words and Phrases and their Compositionality Tomas Mikolov Ilya Sutskever Kai Chen G. Corrado J. Dean NAI OCL 402 33,560 0 16 Oct 2013
Joint Video and Text Parsing for Understanding Events and Answering Queries Kewei Tu Meng Meng M. Lee Tae Eun Choe Song-Chun Zhu VGen 93 120 0 29 Aug 2013