ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.09358
  4. Cited By
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
v1v2v3 (latest)

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

22 July 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
    VLM
ArXiv (abs)PDFHTML

Papers citing "Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods"

44 / 294 papers shown
Title
Grounding of Textual Phrases in Images by Reconstruction
Grounding of Textual Phrases in Images by Reconstruction
Anna Rohrbach
Marcus Rohrbach
Ronghang Hu
Trevor Darrell
Bernt Schiele
80
497
0
12 Nov 2015
Visual7W: Grounded Question Answering in Images
Visual7W: Grounded Question Answering in Images
Yuke Zhu
Oliver Groth
Michael S. Bernstein
Li Fei-Fei
104
887
0
11 Nov 2015
Neural Module Networks
Neural Module Networks
Jacob Andreas
Marcus Rohrbach
Trevor Darrell
Dan Klein
CoGe
139
1,076
0
09 Nov 2015
Generating Images from Captions with Attention
Generating Images from Captions with Attention
Elman Mansimov
Emilio Parisotto
Jimmy Lei Ba
Ruslan Salakhutdinov
VLM
84
457
0
09 Nov 2015
Generation and Comprehension of Unambiguous Object Descriptions
Generation and Comprehension of Unambiguous Object Descriptions
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
ObjD
131
1,357
0
07 Nov 2015
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
Haonan Yu
Jiang Wang
Zhiheng Huang
Yi Yang
Wenyuan Xu
90
560
0
26 Oct 2015
SentiCap: Generating Image Descriptions with Sentiments
SentiCap: Generating Image Descriptions with Sentiments
A. Mathews
Lexing Xie
Xuming He
87
222
0
06 Oct 2015
A large annotated corpus for learning natural language inference
A large annotated corpus for learning natural language inference
Samuel R. Bowman
Gabor Angeli
Christopher Potts
Christopher D. Manning
324
4,294
0
21 Aug 2015
A Survey of Current Datasets for Vision and Language Research
A Survey of Current Datasets for Vision and Language Research
Francis Ferraro
N. Mostafazadeh
Ting-Hao 'Kenneth' Huang
Huang
Lucy Vanderwende
Jacob Devlin
Michel Galley
Margaret Mitchell
VLM
67
75
0
23 Jun 2015
Aligning Books and Movies: Towards Story-like Visual Explanations by
  Watching Movies and Reading Books
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
Yukun Zhu
Ryan Kiros
R. Zemel
Ruslan Salakhutdinov
R. Urtasun
Antonio Torralba
Sanja Fidler
133
2,554
0
22 Jun 2015
Aligning where to see and what to tell: image caption with region-based
  attention and scene factorization
Aligning where to see and what to tell: image caption with region-based attention and scene factorization
Junqi Jin
Kun Fu
Runpeng Cui
Fei Sha
Changshui Zhang
72
117
0
20 Jun 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
  Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMatObjD
528
62,377
0
04 Jun 2015
Visual Madlibs: Fill in the blank Image Generation and Question
  Answering
Visual Madlibs: Fill in the blank Image Generation and Question Answering
Licheng Yu
Eunbyung Park
Alexander C. Berg
Tamara L. Berg
VLMMLLM
102
98
0
31 May 2015
Are You Talking to a Machine? Dataset and Methods for Multilingual Image
  Question Answering
Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering
Haoyuan Gao
Junhua Mao
Jie Zhou
Zhiheng Huang
Lei Wang
Wenyuan Xu
78
501
0
21 May 2015
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for
  Richer Image-to-Sentence Models
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Julia Hockenmaier
Svetlana Lazebnik
208
2,072
0
19 May 2015
Exploring Models and Data for Image Question Answering
Exploring Models and Data for Image Question Answering
Mengye Ren
Ryan Kiros
R. Zemel
80
718
0
08 May 2015
Ask Your Neurons: A Neural-based Approach to Answering Questions about
  Images
Ask Your Neurons: A Neural-based Approach to Answering Questions about Images
Mateusz Malinowski
Marcus Rohrbach
Mario Fritz
108
600
0
05 May 2015
Sequence to Sequence -- Video to Text
Sequence to Sequence -- Video to Text
Subhashini Venugopalan
Marcus Rohrbach
Jeff Donahue
Raymond J. Mooney
Trevor Darrell
Kate Saenko
142
1,419
0
03 May 2015
VQA: Visual Question Answering
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
226
5,503
0
03 May 2015
Using Descriptive Video Services to Create a Large Data Source for Video
  Annotation Research
Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research
Atousa Torabi
C. Pal
Hugo Larochelle
Aaron Courville
VGen
101
205
0
03 Mar 2015
Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs
Nitish Srivastava
Elman Mansimov
Ruslan Salakhutdinov
SSL
142
2,593
0
16 Feb 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
348
10,079
0
10 Feb 2015
A Dataset for Movie Description
A Dataset for Movie Description
Anna Rohrbach
Marcus Rohrbach
Niket Tandon
Bernt Schiele
VGen
122
502
0
12 Jan 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.0K
150,312
0
22 Dec 2014
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
Junhua Mao
Wenyuan Xu
Yi Yang
Jiang Wang
Zhiheng Huang
Alan Yuille
VLM
178
1,240
0
20 Dec 2014
Translating Videos to Natural Language Using Deep Recurrent Neural
  Networks
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
Subhashini Venugopalan
Huijuan Xu
Jeff Donahue
Marcus Rohrbach
Raymond J. Mooney
Kate Saenko
140
952
0
15 Dec 2014
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence
  Modeling
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Junyoung Chung
Çağlar Gülçehre
Kyunghyun Cho
Yoshua Bengio
601
12,741
0
11 Dec 2014
Deep Visual-Semantic Alignments for Generating Image Descriptions
Deep Visual-Semantic Alignments for Generating Image Descriptions
A. Karpathy
Li Fei-Fei
144
5,591
0
07 Dec 2014
Learning Spatiotemporal Features with 3D Convolutional Networks
Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Lubomir D. Bourdev
Rob Fergus
Lorenzo Torresani
Manohar Paluri
3DPC
77
411
0
02 Dec 2014
CIDEr: Consensus-based Image Description Evaluation
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
297
4,508
0
20 Nov 2014
From Captions to Visual Concepts and Back
From Captions to Visual Concepts and Back
Hao Fang
Saurabh Gupta
F. Iandola
R. Srivastava
Li Deng
...
Xiaodong He
Margaret Mitchell
John C. Platt
C. L. Zitnick
Geoffrey Zweig
VLM
119
1,312
0
18 Nov 2014
Show and Tell: A Neural Image Caption Generator
Show and Tell: A Neural Image Caption Generator
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
3DV
260
6,035
0
17 Nov 2014
Long-term Recurrent Convolutional Networks for Visual Recognition and
  Description
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
Jeff Donahue
Lisa Anne Hendricks
Marcus Rohrbach
Subhashini Venugopalan
S. Guadarrama
Kate Saenko
Trevor Darrell
VLM
165
6,056
0
17 Nov 2014
Unifying Visual-Semantic Embeddings with Multimodal Neural Language
  Models
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
Ryan Kiros
Ruslan Salakhutdinov
R. Zemel
VLM
127
1,401
0
10 Nov 2014
A Multi-World Approach to Question Answering about Real-World Scenes
  based on Uncertain Input
A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input
Mateusz Malinowski
Mario Fritz
220
698
0
01 Oct 2014
Going Deeper with Convolutions
Going Deeper with Convolutions
Christian Szegedy
Wei Liu
Yangqing Jia
P. Sermanet
Scott E. Reed
Dragomir Anguelov
D. Erhan
Vincent Vanhoucke
Andrew Rabinovich
485
43,694
0
17 Sep 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAttMDE
1.7K
100,508
0
04 Sep 2014
Neural Machine Translation by Jointly Learning to Align and Translate
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
578
27,327
0
01 Sep 2014
Learning Phrase Representations using RNN Encoder-Decoder for
  Statistical Machine Translation
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Kyunghyun Cho
B. V. Merrienboer
Çağlar Gülçehre
Dzmitry Bahdanau
Fethi Bougares
Holger Schwenk
Yoshua Bengio
AIMat
1.1K
23,388
0
03 Jun 2014
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
424
43,814
0
01 May 2014
Coherent Multi-Sentence Video Description with Variable Level of Detail
Coherent Multi-Sentence Video Description with Variable Level of Detail
Anna Rohrbach
Marcus Rohrbach
Weijian Qiu
Annemarie Friedrich
Sikandar Amin
Mykhaylo Andriluka
Manfred Pinkal
Bernt Schiele
91
218
0
24 Mar 2014
Auto-Encoding Variational Bayes
Auto-Encoding Variational Bayes
Diederik P. Kingma
Max Welling
BDL
455
16,923
0
20 Dec 2013
Distributed Representations of Words and Phrases and their
  Compositionality
Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov
Ilya Sutskever
Kai Chen
G. Corrado
J. Dean
NAIOCL
402
33,560
0
16 Oct 2013
Joint Video and Text Parsing for Understanding Events and Answering
  Queries
Joint Video and Text Parsing for Understanding Events and Answering Queries
Kewei Tu
Meng Meng
M. Lee
Tae Eun Choe
Song-Chun Zhu
VGen
93
120
0
29 Aug 2013
Previous
123456