Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.08530
Cited By
v1
v2
v3
v4 (latest)
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
22 August 2019
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (740★)
Papers citing
"VL-BERT: Pre-training of Generic Visual-Linguistic Representations"
20 / 1,020 papers shown
Title
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
Vishvak Murahari
Dhruv Batra
Devi Parikh
Abhishek Das
VLM
111
117
0
05 Dec 2019
12-in-1: Multi-Task Vision and Language Representation Learning
Jiasen Lu
Vedanuj Goswami
Marcus Rohrbach
Devi Parikh
Stefan Lee
VLM
ObjD
131
481
0
05 Dec 2019
Low Rank Factorization for Compact Multi-Head Self-Attention
Sneha Mehta
Huzefa Rangwala
Naren Ramakrishnan
42
5
0
26 Nov 2019
Learning to Learn Words from Visual Scenes
Dídac Surís
Dave Epstein
Heng Ji
Shih-Fu Chang
Carl Vondrick
VLM
CLIP
SSL
OffRL
68
4
0
25 Nov 2019
Temporal Reasoning via Audio Question Answering
Haytham M. Fayek
Justin Johnson
65
54
0
21 Nov 2019
Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks
Fengda Zhu
Yi Zhu
Xiaojun Chang
Xiaodan Liang
LRM
115
243
0
18 Nov 2019
Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA
Ronghang Hu
Amanpreet Singh
Trevor Darrell
Marcus Rohrbach
94
197
0
14 Nov 2019
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications
Chao Zhang
Zichao Yang
Xiaodong He
Li Deng
HAI
AI4TS
122
337
0
10 Nov 2019
Probing Contextualized Sentence Representations with Visual Awareness
Zhuosheng Zhang
Rui Wang
Kehai Chen
Masao Utiyama
Eiichiro Sumita
Hai Zhao
77
2
0
07 Nov 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
Alex Schwing
LRM
ReLM
102
9
0
31 Oct 2019
Meta Module Network for Compositional Visual Reasoning
Wenhu Chen
Zhe Gan
Linjie Li
Yu Cheng
Wenjie Wang
Jingjing Liu
LRM
93
71
0
08 Oct 2019
UNITER: UNiversal Image-TExt Representation Learning
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLM
OT
134
449
0
25 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
363
947
0
24 Sep 2019
Supervised Multimodal Bitransformers for Classifying Images and Text
Douwe Kiela
Suvrat Bhooshan
Hamed Firooz
Ethan Perez
Davide Testuggine
154
248
0
06 Sep 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
221
907
0
16 Aug 2019
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti
Jeffrey Ling
Michael Collins
David Reitter
95
173
0
14 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
317
3,714
0
06 Aug 2019
Finding Moments in Video Collections Using Natural Language
Victor Escorcia
Mattia Soldan
Josef Sivic
Guohao Li
Bryan C. Russell
55
7
0
30 Jul 2019
Bilinear Graph Networks for Visual Question Answering
Dalu Guo
Chang Xu
Dacheng Tao
GNN
86
53
0
23 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
141
136
0
22 Jul 2019
Previous
1
2
3
...
19
20
21