Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.08530
Cited By
v1
v2
v3
v4 (latest)
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
22 August 2019
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (740★)
Papers citing
"VL-BERT: Pre-training of Generic Visual-Linguistic Representations"
50 / 1,020 papers shown
Title
One4all User Representation for Recommender Systems in E-commerce
Kyuyong Shin
Hanock Kwak
KyungHyun Kim
Minkyu Kim
Young-Jin Park
Jisu Jeong
Seungjae Jung
62
28
0
24 May 2021
Aligning Visual Prototypes with BERT Embeddings for Few-Shot Learning
Kun Yan
Zied Bouraoui
Ping Wang
Shoaib Jameel
Steven Schockaert
59
24
0
21 May 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Prahal Arora
Masoumeh Aminzadeh
Christoph Feichtenhofer
Florian Metze
Luke Zettlemoyer
82
133
0
20 May 2021
Parallel Attention Network with Sequence Matching for Video Grounding
Hao Zhang
Aixin Sun
Wei Jing
Liangli Zhen
Qiufeng Wang
Rick Siow Mong Goh
109
41
0
18 May 2021
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
138
142
0
17 May 2021
Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues
Qingxiu Dong
Ziwei Qin
Heming Xia
Tian Feng
Shoujie Tong
...
Weidong Zhan
Sujian Li
Zhongyu Wei
Tianyu Liu
Zuifang Sui
LRM
64
11
0
15 May 2021
Connecting What to Say With Where to Look by Modeling Human Attention Traces
Zihang Meng
Licheng Yu
Ning Zhang
Tamara L. Berg
Babak Damavandi
Vikas Singh
Amy Bearman
157
25
0
12 May 2021
Cross-Modal Generative Augmentation for Visual Question Answering
Zixu Wang
Yishu Miao
Lucia Specia
78
11
0
11 May 2021
T-EMDE: Sketching-based global similarity for cross-modal retrieval
Barbara Rychalska
Mikolaj Wieczorek
Jacek Dąbrowski
59
0
0
10 May 2021
ISTR: End-to-End Instance Segmentation with Transformers
Jie Hu
Liujuan Cao
Yao Lu
Shengchuan Zhang
Yan Wang
Ke Li
Feiyue Huang
Ling Shao
Rongrong Ji
ISeg
87
96
0
03 May 2021
CAT: Cross-Attention Transformer for One-Shot Object Detection
Weidong Lin
Yuyang Deng
Yang Gao
Ning Wang
Jinghao Zhou
Lingqiao Liu
Lei Zhang
Peng Wang
ViT
127
9
0
30 Apr 2021
Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads
Chenyu Gao
Qi Zhu
Peng Wang
Qi Wu
25
2
0
30 Apr 2021
Multimodal Contrastive Training for Visual Representation Learning
Xin Yuan
Zhe Lin
Jason Kuen
Jianming Zhang
Yilin Wang
Michael Maire
Ajinkya Kale
Baldo Faieta
SSL
85
157
0
26 Apr 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
249
897
0
26 Apr 2021
InfographicVQA
Minesh Mathew
Viraj Bagal
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
C. V. Jawahar
112
242
0
26 Apr 2021
MusCaps: Generating Captions for Music Audio
Ilaria Manco
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
116
37
0
24 Apr 2021
M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers
Tianrui Guan
Jun Wang
Shiyi Lan
Rohan Chandra
Zuxuan Wu
Larry S. Davis
Tianyi Zhou
ViT
3DPC
91
123
0
24 Apr 2021
Playing Lottery Tickets with Vision and Language
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
142
56
0
23 Apr 2021
Detector-Free Weakly Supervised Grounding by Separation
Assaf Arbelle
Sivan Doveh
Amit Alfassy
J. Shtok
Guy Lev
...
Kate Saenko
S. Ullman
Raja Giryes
Rogerio Feris
Leonid Karlinsky
92
24
0
20 Apr 2021
Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
Chenyi Lei
Shixian Luo
Yong Liu
Wanggui He
Jiamang Wang
Guoxin Wang
Haihong Tang
Chunyan Miao
Houqiang Li
60
42
0
19 Apr 2021
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
Yiheng Xu
Tengchao Lv
Lei Cui
Guoxin Wang
Yijuan Lu
D. Florêncio
Cha Zhang
Furu Wei
MLLM
VLM
109
130
0
18 Apr 2021
Worst of Both Worlds: Biases Compound in Pre-trained Vision-and-Language Models
Tejas Srinivasan
Yonatan Bisk
VLM
83
56
0
18 Apr 2021
TransVG: End-to-End Visual Grounding with Transformers
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
109
348
0
17 Apr 2021
LAMPRET: Layout-Aware Multimodal PreTraining for Document Understanding
Te-Lin Wu
Cheng-rong Li
Mingyang Zhang
Tao Chen
Spurthi Amba Hombaiah
Michael Bendersky
79
14
0
16 Apr 2021
AMMU : A Survey of Transformer-based Biomedical Pretrained Language Models
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
LM&MA
MedIm
115
170
0
16 Apr 2021
Cross-Modal Retrieval Augmentation for Multi-Modal Classification
Shir Gur
Natalia Neverova
C. Stauffer
Ser-Nam Lim
Douwe Kiela
A. Reiter
147
30
0
16 Apr 2021
Effect of Visual Extensions on Natural Language Understanding in Vision-and-Language Models
Taichi Iki
Akiko Aizawa
VLM
60
20
0
16 Apr 2021
MultiModalQA: Complex Question Answering over Text, Tables and Images
Alon Talmor
Ori Yoran
Amnon Catav
Dan Lahav
Yizhong Wang
Akari Asai
Gabriel Ilharco
Hannaneh Hajishirzi
Jonathan Berant
LMTD
99
163
0
13 Apr 2021
Escaping the Big Data Paradigm with Compact Transformers
Ali Hassani
Steven Walton
Nikhil Shah
Abulikemu Abuduweili
Jiachen Li
Humphrey Shi
154
464
0
12 Apr 2021
The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
Yuankai Qi
Zizheng Pan
Yicong Hong
Ming-Hsuan Yang
Anton Van Den Hengel
Qi Wu
LM&Ro
82
69
0
09 Apr 2021
How Transferable are Reasoning Patterns in VQA?
Corentin Kervadec
Theo Jaunet
G. Antipov
M. Baccouche
Romain Vuillemot
Christian Wolf
LRM
59
28
0
08 Apr 2021
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
VLM
ViT
158
274
0
07 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
116
99
0
05 Apr 2021
VisQA: X-raying Vision and Language Reasoning in Transformers
Theo Jaunet
Corentin Kervadec
Romain Vuillemot
G. Antipov
M. Baccouche
Christian Wolf
64
26
0
02 Apr 2021
Towards General Purpose Vision Systems
Tanmay Gupta
Amita Kamath
Aniruddha Kembhavi
Derek Hoiem
100
53
0
01 Apr 2021
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
Mingyang Zhou
Luowei Zhou
Shuohang Wang
Yu Cheng
Linjie Li
Zhou Yu
Jingjing Liu
MLLM
VLM
97
92
0
01 Apr 2021
A Survey on Natural Language Video Localization
Xinfang Liu
Xiushan Nie
Zhifang Tan
Jie Guo
Yilong Yin
121
7
0
01 Apr 2021
Zero-Shot Language Transfer vs Iterative Back Translation for Unsupervised Machine Translation
Aviral Joshi
Chengzhi Huang
H. Singh
48
0
0
31 Mar 2021
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Or Patashnik
Zongze Wu
Eli Shechtman
Daniel Cohen-Or
Dani Lischinski
CLIP
VLM
196
1,213
0
31 Mar 2021
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
Antoine Miech
Jean-Baptiste Alayrac
Ivan Laptev
Josef Sivic
Andrew Zisserman
ViT
99
139
0
30 Mar 2021
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Linbo Jin
Ben Chen
Hao Zhou
Minghui Qiu
Ling Shao
VLM
99
121
0
30 Mar 2021
Self-supervised Image-text Pre-training With Mixed Data In Chest X-rays
Xiaosong Wang
Ziyue Xu
Leo K. Tam
Dong Yang
Daguang Xu
ViT
MedIm
68
24
0
30 Mar 2021
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
Pengchuan Zhang
Xiyang Dai
Jianwei Yang
Bin Xiao
Lu Yuan
Lei Zhang
Jianfeng Gao
ViT
116
337
0
29 Mar 2021
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval
Song Liu
Haoqi Fan
Shengsheng Qian
Yiru Chen
Wenkui Ding
Zhongyuan Wang
106
147
0
28 Mar 2021
Visual Grounding Strategies for Text-Only Natural Language Processing
Damien Sileo
45
8
0
25 Mar 2021
VLGrammar: Grounded Grammar Induction of Vision and Language
Yining Hong
Qing Li
Song-Chun Zhu
Siyuan Huang
VLM
89
25
0
24 Mar 2021
Scene-Intuitive Agent for Remote Embodied Visual Grounding
Xiangru Lin
Guanbin Li
Yizhou Yu
LM&Ro
80
53
0
24 Mar 2021
Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval
Gregor Geigle
Jonas Pfeiffer
Nils Reimers
Ivan Vulić
Iryna Gurevych
104
60
0
22 Mar 2021
Local Interpretations for Explainable Natural Language Processing: A Survey
Siwen Luo
Hamish Ivison
S. Han
Josiah Poon
MILM
120
51
0
20 Mar 2021
Variational Knowledge Distillation for Disease Classification in Chest X-Rays
Tom van Sonsbeek
Xiantong Zhen
M. Worring
Ling Shao
30
13
0
19 Mar 2021
Previous
1
2
3
...
16
17
18
19
20
21
Next