Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.09358
Cited By
v1
v2
v3 (latest)
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
22 July 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods"
50 / 294 papers shown
Title
Visual Reference Resolution using Attention Memory for Visual Dialog
Paul Hongsuck Seo
Andreas M. Lehrmann
Bohyung Han
Leonid Sigal
77
123
0
23 Sep 2017
FiLM: Visual Reasoning with a General Conditioning Layer
Ethan Perez
Florian Strub
H. D. Vries
Vincent Dumoulin
Aaron Courville
FAtt
AIMat
OffRL
AI4CE
363
2,233
0
22 Sep 2017
Stack-Captioning: Coarse-to-Fine Learning for Image Captioning
Jiuxiang Gu
Jianfei Cai
G. Wang
Tsuhan Chen
75
180
0
11 Sep 2017
Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects
Ting Yao
Yingwei Pan
Yehao Li
Tao Mei
VLM
60
147
0
17 Aug 2017
Hierarchically-Attentive RNN for Album Summarization and Storytelling
Licheng Yu
Joey Tianyi Zhou
Tamara L. Berg
64
66
0
09 Aug 2017
Reinforced Video Captioning with Entailment Rewards
Ramakanth Pasunuru
Joey Tianyi Zhou
63
115
0
07 Aug 2017
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
123
949
0
04 Aug 2017
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
123
4,221
0
25 Jul 2017
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts
Xuwang Yin
Vicente Ordonez
VLM
83
55
0
22 Jul 2017
An empirical study on the effectiveness of images in Multimodal Neural Machine Translation
Jean-Benoit Delbrouck
Stéphane Dupont
68
39
0
04 Jul 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
781
132,363
0
12 Jun 2017
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
Jiasen Lu
A. Kannan
Jianwei Yang
Devi Parikh
Dhruv Batra
BDL
74
137
0
05 Jun 2017
A simple neural network module for relational reasoning
Adam Santoro
David Raposo
David Barrett
Mateusz Malinowski
Razvan Pascanu
Peter W. Battaglia
Timothy Lillicrap
GNN
NAI
189
1,615
0
05 Jun 2017
Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
Jingkuan Song
Zhao Guo
Lianli Gao
Wu Liu
Dongxiang Zhang
Heng Tao Shen
80
166
0
05 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
111
2,937
0
26 May 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
240
8,038
0
22 May 2017
Imagination improves Multimodal Translation
Desmond Elliott
Ákos Kádár
155
137
0
11 May 2017
Inferring and Executing Programs for Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Judy Hoffman
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
NAI
89
545
0
10 May 2017
Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner
Tseng-Hung Chen
Yuan-Hong Liao
Ching-Yao Chuang
W. Hsu
Jianlong Fu
Min Sun
93
142
0
02 May 2017
STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset
Yuya Yoshikawa
Yutaro Shigeto
A. Takeuchi
3DV
60
118
0
02 May 2017
Dense-Captioning Events in Videos
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
144
1,249
0
02 May 2017
Multi-Task Video Captioning with Video and Entailment Generation
Ramakanth Pasunuru
Joey Tianyi Zhou
57
117
0
24 Apr 2017
An Analysis of Action Recognition Datasets for Language and Vision Tasks
Spandana Gella
Frank Keller
ObjD
38
11
0
24 Apr 2017
Attention Strategies for Multi-Source Sequence-to-Sequence Learning
Jindrich Libovický
Jindřich Helcl
AIMat
76
183
0
21 Apr 2017
Learning to Reason: End-to-End Module Networks for Visual Question Answering
Ronghang Hu
Jacob Andreas
Marcus Rohrbach
Trevor Darrell
Kate Saenko
KELM
GNN
ReLM
LRM
129
579
0
18 Apr 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Y. Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
87
561
0
14 Apr 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
77
498
0
11 Apr 2017
Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images
Rakshith Shetty
Bernt Schiele
Mario Fritz
102
228
0
30 Mar 2017
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
Albert Gatt
E. Krahmer
LM&MA
ELM
124
820
0
29 Mar 2017
Towards Automatic Learning of Procedures from Web Instructional Videos
Luowei Zhou
Chenliang Xu
Jason J. Corso
EgoV
75
830
0
28 Mar 2017
Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation
Jean-Benoit Delbrouck
Stéphane Dupont
77
30
0
23 Mar 2017
Recurrent Multimodal Interaction for Referring Image Segmentation
Chenxi Liu
Zhe Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Alan Yuille
EgoV
73
240
0
23 Mar 2017
Recurrent Topic-Transition GAN for Visual Paragraph Generation
Xiaodan Liang
Zhiting Hu
Huatian Zhang
Chuang Gan
Eric Xing
GAN
69
202
0
21 Mar 2017
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
Abhishek Das
Satwik Kottur
J. M. F. Moura
Stefan Lee
Dhruv Batra
OffRL
124
425
0
20 Mar 2017
TAC-GAN - Text Conditioned Auxiliary Classifier Generative Adversarial Network
Ayushman Dash
J. Gamboa
Sheraz Ahmed
Marcus Liwicki
Muhammad Zeshan Afzal
GAN
75
143
0
19 Mar 2017
Towards Diverse and Natural Image Descriptions via a Conditional GAN
Bo Dai
Sanja Fidler
R. Urtasun
Dahua Lin
GAN
82
454
0
17 Mar 2017
Learning Robust Visual-Semantic Embeddings
Yao-Hung Hubert Tsai
Liang-Kang Huang
Ruslan Salakhutdinov
SSL
AI4TS
67
166
0
17 Mar 2017
End-to-end optimization of goal-driven and visually grounded dialogue systems
Florian Strub
H. D. Vries
Jérémie Mary
Bilal Piot
Aaron Courville
Olivier Pietquin
OffRL
59
138
0
15 Mar 2017
Bilateral Multi-Perspective Matching for Natural Language Sentences
Zhiguo Wang
Wael Hamza
Radu Florian
90
804
0
13 Feb 2017
Representations of language in a model of visually grounded speech signal
Grzegorz Chrupała
Lieke Gelderloos
Afra Alishahi
78
131
0
07 Feb 2017
Doubly-Attentive Decoder for Multi-modal Neural Machine Translation
Iacer Calixto
Qun Liu
N. Campbell
133
183
0
04 Feb 2017
Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation
N. Mostafazadeh
Chris Brockett
W. Dolan
Michel Galley
Jianfeng Gao
Georgios P. Spithourakis
Lucy Vanderwende
73
183
0
28 Jan 2017
Incorporating Global Visual Features into Attention-Based Neural Machine Translation
Iacer Calixto
Qun Liu
Nick Campbell
115
156
0
23 Jan 2017
Comprehension-guided referring expressions
Ruotian Luo
Gregory Shakhnarovich
ObjD
97
171
0
12 Jan 2017
Attention-Based Multimodal Fusion for Video Description
Chiori Hori
Takaaki Hori
Teng-Yok Lee
Kazuhiro Sumi
J. Hershey
Tim K. Marks
78
360
0
11 Jan 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Licheng Yu
Hao Tan
Joey Tianyi Zhou
Tamara L. Berg
ObjD
94
275
0
30 Dec 2016
YOLO9000: Better, Faster, Stronger
Joseph Redmon
Ali Farhadi
VLM
ObjD
183
15,633
0
25 Dec 2016
An Empirical Study of Language CNN for Image Captioning
Jiuxiang Gu
G. Wang
Jianfei Cai
Tsuhan Chen
74
133
0
21 Dec 2016
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
313
2,387
0
20 Dec 2016
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
Han Zhang
Tao Xu
Hongsheng Li
Shaoting Zhang
Xiaogang Wang
Xiaolei Huang
Dimitris N. Metaxas
GAN
122
2,728
0
10 Dec 2016
Previous
1
2
3
4
5
6
Next