Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.08530
Cited By
v1
v2
v3
v4 (latest)
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
22 August 2019
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (740★)
Papers citing
"VL-BERT: Pre-training of Generic Visual-Linguistic Representations"
50 / 1,020 papers shown
Title
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
Woojeong Jin
Yu Cheng
Yelong Shen
Weizhu Chen
Xiang Ren
VLM
VPVLM
MLLM
117
138
0
16 Oct 2021
MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants
Alkesh Patel
Joel Ruben Antony Moniz
R. Nguyen
Nicholas Tzou
Hadas Kotek
Vincent Renkens
VGen
23
1
0
13 Oct 2021
Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos
Zongmeng Zhang
Xianjing Han
Xuemeng Song
Yan Yan
Liqiang Nie
118
37
0
12 Oct 2021
SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition
Hezhen Hu
Weichao Zhao
Wen-gang Zhou
Yuechen Wang
Houqiang Li
ViT
79
71
0
11 Oct 2021
Pano-AVQA: Grounded Audio-Visual Question Answering on 360
∘
^\circ
∘
Videos
Heeseung Yun
Youngjae Yu
Wonsuk Yang
Kangil Lee
Gunhee Kim
100
86
0
11 Oct 2021
Efficient Multi-Modal Embeddings from Structured Data
A. Vero
Ann A. Copestake
33
4
0
06 Oct 2021
Trustworthy AI: From Principles to Practices
Yue Liu
Peng Qi
Bo Liu
Shuai Di
Jingen Liu
Jiquan Pei
Jinfeng Yi
Bowen Zhou
213
383
0
04 Oct 2021
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
Long Chen
Yuhang Zheng
Yulei Niu
Hanwang Zhang
Jun Xiao
AAML
OOD
119
37
0
03 Oct 2021
IntentVizor: Towards Generic Query Guided Interactive Video Summarization
Guande Wu
Jianzhe Lin
Claudio T. Silva
84
24
0
30 Sep 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
317
583
0
28 Sep 2021
Visually Grounded Reasoning across Languages and Cultures
Fangyu Liu
Emanuele Bugliarello
Edoardo Ponti
Siva Reddy
Nigel Collier
Desmond Elliott
VLM
LRM
171
180
0
28 Sep 2021
MLIM: Vision-and-Language Model Pre-training with Masked Language and Image Modeling
Tarik Arici
M. S. Seyfioglu
T. Neiman
Yi Tian Xu
Son N. Tran
Trishul Chilimbi
Belinda Zeng
Ismail B. Tutar
VLM
59
15
0
24 Sep 2021
Detecting Harmful Memes and Their Targets
Shraman Pramanick
Dimitar Dimitrov
Rituparna Mukherjee
Shivam Sharma
Md. Shad Akhtar
Preslav Nakov
Tanmoy Chakraborty
80
117
0
24 Sep 2021
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLM
VPVLM
VLM
300
224
0
24 Sep 2021
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation
Yongfei Liu
Chenfei Wu
Shao-Yen Tseng
Vasudev Lal
Xuming He
Nan Duan
CLIP
VLM
110
29
0
22 Sep 2021
An End-to-End Transformer Model for 3D Object Detection
Ishan Misra
Rohit Girdhar
Armand Joulin
3DPC
ViT
143
487
0
16 Sep 2021
What Vision-Language Models `See' when they See Scenes
Michele Cafagna
Kees van Deemter
Albert Gatt
VLM
99
13
0
15 Sep 2021
Panoptic Narrative Grounding
Cristina González
Nicolás Ayobi
Isabela Hernández
José Hernández
Jordi Pont-Tuset
Pablo Arbeláez
146
23
0
10 Sep 2021
We went to look for meaning and all we got were these lousy representations: aspects of meaning representation for computational semantics
Simon Dobnik
R. Cooper
Adam Ek
Bill Noble
Staffan Larsson
N. Ilinykh
Vladislav Maraev
Vidya Somashekarappa
64
0
0
10 Sep 2021
ReasonBERT: Pre-trained to Reason with Distant Supervision
Xiang Deng
Yu-Chuan Su
Alyssa Lees
You Wu
Cong Yu
Huan Sun
ReLM
RALM
OffRL
LRM
85
32
0
10 Sep 2021
Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation
H. Khan
D. Gupta
Asif Ekbal
57
14
0
10 Sep 2021
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers
Stella Frank
Emanuele Bugliarello
Desmond Elliott
82
82
0
09 Sep 2021
TxT: Crossmodal End-to-End Learning with Transformers
Jan-Martin O. Steitz
Jonas Pfeiffer
Iryna Gurevych
Stefan Roth
LRM
29
2
0
09 Sep 2021
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining
Xiao Dong
Xunlin Zhan
Yangxin Wu
Yunchao Wei
Michael C. Kampffmeyer
Xiaoyong Wei
Minlong Lu
Yaowei Wang
Xiaodan Liang
116
38
0
09 Sep 2021
Towards Natural Language Interfaces for Data Visualization: A Survey
Leixian Shen
Enya Shen
Yuyu Luo
Xiaocong Yang
Xuming Hu
Xiongshuai Zhang
Zhiwei Tai
Jianmin Wang
111
146
0
08 Sep 2021
Learning grounded word meaning representations on similarity graphs
Mariella Dimiccoli
H. Wendt
Pau Batlle
39
1
0
07 Sep 2021
Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization
Tiezheng Yu
Wenliang Dai
Zihan Liu
Pascale Fung
105
74
0
06 Sep 2021
Improved RAMEN: Towards Domain Generalization for Visual Question Answering
Bhanuka Gamage
Lim Chern Hong
72
1
0
06 Sep 2021
Learning to Generate Scene Graph from Natural Language Supervision
Yiwu Zhong
Jing Shi
Jianwei Yang
Chenliang Xu
Yin Li
SSL
93
79
0
06 Sep 2021
Data Efficient Masked Language Modeling for Vision and Language
Yonatan Bitton
Gabriel Stanovsky
Michael Elhadad
Roy Schwartz
VLM
82
20
0
05 Sep 2021
LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation
Mohammad Abuzar Shaikh
Zhanghexuan Ji
Dana Moukheiber
Yan Shen
S. Srihari
Mingchen Gao
VLM
46
1
0
04 Sep 2021
Multimodal Conditionality for Natural Language Generation
Michael Sollami
Aashish Jain
73
10
0
02 Sep 2021
WebQA: Multihop and Multimodal QA
Yingshan Chang
M. Narang
Hisami Suzuki
Guihong Cao
Jianfeng Gao
Yonatan Bisk
LRM
83
87
0
01 Sep 2021
Fine-Grained Chemical Entity Typing with Multimodal Knowledge Representation
Chenkai Sun
Weijian Li
Jinfeng Xiao
Nikolaus Nova Parulian
ChengXiang Zhai
Heng Ji
84
4
0
29 Aug 2021
On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering
K. Gouthaman
Anurag Mittal
CML
77
0
0
28 Aug 2021
Vision-Language Navigation: A Survey and Taxonomy
Wansen Wu
Tao Chang
Xinmeng Li
LM&Ro
71
24
0
26 Aug 2021
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
Hanbo Zhang
Yunfan Lu
Cunjun Yu
David Hsu
Xuguang Lan
Nanning Zheng
LM&Ro
108
66
0
25 Aug 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
158
799
0
24 Aug 2021
Guiding Query Position and Performing Similar Attention for Transformer-Based Detection Heads
Xiaohu Jiang
Ze Chen
Zhicheng Wang
Erjin Zhou
Chun Yuan
31
2
0
22 Aug 2021
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
Yuxin Wang
Hongtao Xie
Shancheng Fang
Jing Wang
Shenggao Zhu
Yongdong Zhang
VLM
94
154
0
22 Aug 2021
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Ming Yan
Haiyang Xu
Chenliang Li
Bin Bi
Junfeng Tian
Min Gui
Wei Wang
VLM
62
10
0
21 Aug 2021
Group-based Distinctive Image Captioning with Memory Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
100
18
0
20 Aug 2021
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur
Makarand Tapaswi
Shizhe Chen
Ivan Laptev
Cordelia Schmid
LM&Ro
59
144
0
20 Aug 2021
Knowledge Perceived Multi-modal Pretraining in E-commerce
Yushan Zhu
Huaixiao Tou
Wen Zhang
Ganqiang Ye
Hui Chen
Ningyu Zhang
Huajun Chen
92
33
0
20 Aug 2021
CIGLI: Conditional Image Generation from Language & Image
Xiaopeng Lu
Lynnette Hui Xian Ng
Jared Fernandez
Hao Zhu
DiffM
48
6
0
20 Aug 2021
Who's Waldo? Linking People Across Text and Images
Claire Yuqing Cui
Apoorv Khandelwal
Yoav Artzi
Noah Snavely
Hadar Averbuch-Elor
88
21
0
16 Aug 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
64
56
0
16 Aug 2021
Medical-VLBERT: Medical Visual Language BERT for COVID-19 CT Report Generation With Alternate Learning
Guangyi Liu
Yinghong Liao
Fuyu Wang
Bin Zhang
Lu Zhang
...
Xiang Wan
Shaolin Li
Zhen Li
Shuixing Zhang
Shuguang Cui
114
59
0
11 Aug 2021
BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis
Masoud Monajatipoor
Mozhdeh Rouhsedaghat
Liunian Harold Li
Aichi Chien
C.-C. Jay Kuo
Fabien Scalzo
Kai-Wei Chang
LM&MA
MedIm
60
31
0
10 Aug 2021
Reference-based Defect Detection Network
Zhaoyang Zeng
Bei Liu
Jianlong Fu
Hongyang Chao
ObjD
79
34
0
10 Aug 2021
Previous
1
2
3
...
14
15
16
...
19
20
21
Next