Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.09358
Cited By
v1
v2
v3 (latest)
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
22 July 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods"
50 / 294 papers shown
Title
What Is Missing in Multilingual Visual Reasoning and How to Fix It
Yueqi Song
Simran Khanuja
Graham Neubig
VLM
LRM
174
6
0
03 Mar 2024
Generative Adversarial Networks
Gilad Cohen
Raja Giryes
GAN
298
30,149
0
01 Mar 2022
Describe What to Change: A Text-guided Unsupervised Image-to-Image Translation Approach
Yahui Liu
Marco De Nadai
Deng Cai
Huayang Li
Xavier Alameda-Pineda
N. Sebe
Bruno Lepri
84
59
0
10 Aug 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
873
42,379
0
28 May 2020
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
Jize Cao
Zhe Gan
Yu Cheng
Licheng Yu
Yen-Chun Chen
Jingjing Liu
VLM
88
130
0
15 May 2020
VIOLIN: A Large-Scale Dataset for Video-and-Language Inference
J. Liu
Wenhu Chen
Yu Cheng
Zhe Gan
Licheng Yu
Yiming Yang
Jingjing Liu
MLLM
VGen
79
70
0
25 Mar 2020
Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Ziqi Zhang
Yaya Shi
Chunfen Yuan
Bing Li
Peijin Wang
Weiming Hu
Zhengjun Zha
VLM
88
272
0
26 Feb 2020
ManiGAN: Text-Guided Image Manipulation
Bowen Li
Xiaojuan Qi
Thomas Lukasiewicz
Philip Torr
EGVM
103
286
0
12 Dec 2019
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
Mohit Shridhar
Jesse Thomason
Daniel Gordon
Yonatan Bisk
Winson Han
Roozbeh Mottaghi
Luke Zettlemoyer
Dieter Fox
LM&Ro
117
779
0
03 Dec 2019
Just Ask:An Interactive Learning Framework for Vision and Language Navigation
Ta-Chung Chi
Mihail Eric
Seokhwan Kim
Minmin Shen
Dilek Z. Hakkani-Tür
54
72
0
02 Dec 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
355
942
0
24 Sep 2019
Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction
Jingwen Wang
Lin Ma
Wenhao Jiang
76
182
0
11 Sep 2019
MULE: Multimodal Universal Language Embedding
Donghyun Kim
Kuniaki Saito
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
70
40
0
08 Sep 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
169
1,666
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
250
2,488
0
20 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
207
905
0
16 Aug 2019
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti
Jeffrey Ling
Michael Collins
David Reitter
65
173
0
14 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
243
3,695
0
06 Aug 2019
Vision-and-Dialog Navigation
Jesse Thomason
Michael Murray
Maya Cakmak
Luke Zettlemoyer
LM&Ro
93
330
0
10 Jul 2019
Barack's Wife Hillary: Using Knowledge-Graphs for Fact-Aware Language Modeling
IV RobertL.Logan
Nelson F. Liu
Matthew E. Peters
Matt Gardner
Sameer Singh
RALM
73
188
0
17 Jun 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
118
1,207
0
07 Jun 2019
Explain Yourself! Leveraging Language Models for Commonsense Reasoning
Nazneen Rajani
Bryan McCann
Caiming Xiong
R. Socher
ReLM
LRM
84
566
0
06 Jun 2019
Learning to Compose and Reason with Language Tree Structures for Visual Grounding
Richang Hong
Daqing Liu
Xiaoyu Mo
Xiangnan He
Hanwang Zhang
ReLM
LRM
87
163
0
05 Jun 2019
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
Kenneth Marino
Mohammad Rastegari
Ali Farhadi
Roozbeh Mottaghi
117
1,090
0
31 May 2019
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan
Quoc V. Le
3DV
MedIm
153
18,179
0
28 May 2019
Language-Conditioned Graph Networks for Relational Reasoning
Ronghang Hu
Anna Rohrbach
Trevor Darrell
Kate Saenko
77
173
0
10 May 2019
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
Jiayuan Mao
Chuang Gan
Pushmeet Kohli
J. Tenenbaum
Jiajun Wu
NAI
138
702
0
26 Apr 2019
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
73
229
0
25 Apr 2019
Pointing Novel Objects in Image Captioning
Yehao Li
Ting Yao
Yingwei Pan
Hongyang Chao
Tao Mei
83
69
0
25 Apr 2019
Challenges and Prospects in Vision and Language Research
Kushal Kafle
Robik Shrestha
Christopher Kanan
48
41
0
19 Apr 2019
Towards VQA Models That Can Read
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
EgoV
111
1,253
0
18 Apr 2019
A Simple Baseline for Audio-Visual Scene-Aware Dialog
Idan Schwartz
Alex Schwing
Tamir Hazan
69
71
0
11 Apr 2019
Reasoning Visual Dialogs with Structural and Partial Observations
Zilong Zheng
Wenguan Wang
Siyuan Qi
Song-Chun Zhu
86
117
0
11 Apr 2019
Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout
Hao Tan
Licheng Yu
Joey Tianyi Zhou
SSL
88
321
0
08 Apr 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
101
555
0
06 Apr 2019
VideoBERT: A Joint Model for Video and Language Representation Learning
Chen Sun
Austin Myers
Carl Vondrick
Kevin Patrick Murphy
Cordelia Schmid
VLM
SSL
82
1,249
0
03 Apr 2019
Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches
Shane Storks
Qiaozi Gao
J. Chai
71
132
0
02 Apr 2019
Knowledge-driven Encode, Retrieve, Paraphrase for Medical Image Report Generation
Yuan Li
Xiaodan Liang
Zhiting Hu
Eric Xing
MedIm
57
272
0
25 Mar 2019
Probing the Need for Visual Context in Multimodal Machine Translation
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
Loïc Barrault
77
142
0
20 Mar 2019
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning
Dong-Jin Kim
Jinsoo Choi
Tae-Hyun Oh
In So Kweon
54
84
0
14 Mar 2019
MirrorGAN: Learning Text-to-image Generation by Redescription
Tingting Qiao
Jing Zhang
Duanqing Xu
Dacheng Tao
VLM
GAN
61
542
0
14 Mar 2019
Visual Semantic Information Pursuit: A Survey
Daqi Liu
M. Bober
J. Kittler
58
31
0
13 Mar 2019
CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
Satwik Kottur
José M. F. Moura
Devi Parikh
Dhruv Batra
Marcus Rohrbach
77
87
0
07 Mar 2019
COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis
Yansong Tang
Dajun Ding
Yongming Rao
Yu Zheng
Danyang Zhang
Lili Zhao
Jiwen Lu
Jie Zhou
132
317
0
07 Mar 2019
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
Chi Zhang
Feng Gao
Baoxiong Jia
Yixin Zhu
Song-Chun Zhu
AIMat
72
312
0
07 Mar 2019
Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation
Liyiming Ke
Xiujun Li
Yonatan Bisk
Ari Holtzman
Zhe Gan
Jingjing Liu
Jianfeng Gao
Yejin Choi
S. Srinivasa
93
168
0
06 Mar 2019
The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation
Chih-Yao Ma
Zuxuan Wu
G. Al-Regib
Caiming Xiong
Z. Kira
LM&Ro
85
174
0
05 Mar 2019
Image-Question-Answer Synergistic Network for Visual Dialog
Dalu Guo
Chang Xu
Dacheng Tao
48
74
0
26 Feb 2019
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
Drew A. Hudson
Christopher D. Manning
CoGe
NAI
69
138
0
25 Feb 2019
MUREL: Multimodal Relational Reasoning for Visual Question Answering
Rémi Cadène
H. Ben-younes
Matthieu Cord
Nicolas Thome
LRM
68
277
0
25 Feb 2019
1
2
3
4
5
6
Next