Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.08530
Cited By
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
22 August 2019
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VL-BERT: Pre-training of Generic Visual-Linguistic Representations"
50 / 1,012 papers shown
Title
Visually Grounded Reasoning across Languages and Cultures
Fangyu Liu
Emanuele Bugliarello
Edoardo Ponti
Siva Reddy
Nigel Collier
Desmond Elliott
VLM
LRM
111
171
0
28 Sep 2021
MLIM: Vision-and-Language Model Pre-training with Masked Language and Image Modeling
Tarik Arici
M. S. Seyfioglu
T. Neiman
Yi Tian Xu
Son N. Tran
Trishul Chilimbi
Belinda Zeng
Ismail B. Tutar
VLM
16
15
0
24 Sep 2021
Detecting Harmful Memes and Their Targets
Shraman Pramanick
Dimitar Dimitrov
Rituparna Mukherjee
Shivam Sharma
Md. Shad Akhtar
Preslav Nakov
Tanmoy Chakraborty
28
108
0
24 Sep 2021
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLM
VPVLM
VLM
211
221
0
24 Sep 2021
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation
Yongfei Liu
Chenfei Wu
Shao-Yen Tseng
Vasudev Lal
Xuming He
Nan Duan
CLIP
VLM
53
28
0
22 Sep 2021
An End-to-End Transformer Model for 3D Object Detection
Ishan Misra
Rohit Girdhar
Armand Joulin
3DPC
ViT
39
472
0
16 Sep 2021
What Vision-Language Models `See' when they See Scenes
Michele Cafagna
Kees van Deemter
Albert Gatt
VLM
37
13
0
15 Sep 2021
Panoptic Narrative Grounding
Cristina González
Nicolás Ayobi
Isabela Hernández
José Hernández
Jordi Pont-Tuset
Pablo Arbeláez
87
22
0
10 Sep 2021
We went to look for meaning and all we got were these lousy representations: aspects of meaning representation for computational semantics
Simon Dobnik
R. Cooper
Adam Ek
Bill Noble
Staffan Larsson
N. Ilinykh
Vladislav Maraev
Vidya Somashekarappa
30
0
0
10 Sep 2021
ReasonBERT: Pre-trained to Reason with Distant Supervision
Xiang Deng
Yu-Chuan Su
Alyssa Lees
You Wu
Cong Yu
Huan Sun
ReLM
RALM
OffRL
LRM
19
31
0
10 Sep 2021
Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation
H. Khan
D. Gupta
Asif Ekbal
30
14
0
10 Sep 2021
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers
Stella Frank
Emanuele Bugliarello
Desmond Elliott
32
81
0
09 Sep 2021
TxT: Crossmodal End-to-End Learning with Transformers
Jan-Martin O. Steitz
Jonas Pfeiffer
Iryna Gurevych
Stefan Roth
LRM
21
2
0
09 Sep 2021
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining
Xiao Dong
Xunlin Zhan
Yangxin Wu
Yunchao Wei
Michael C. Kampffmeyer
Xiaoyong Wei
Minlong Lu
Yaowei Wang
Xiaodan Liang
35
37
0
09 Sep 2021
Towards Natural Language Interfaces for Data Visualization: A Survey
Leixian Shen
Enya Shen
Yuyu Luo
Xiaocong Yang
Xuming Hu
Xiongshuai Zhang
Zhiwei Tai
Jianmin Wang
34
137
0
08 Sep 2021
Learning grounded word meaning representations on similarity graphs
Mariella Dimiccoli
H. Wendt
Pau Batlle
18
1
0
07 Sep 2021
Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization
Tiezheng Yu
Wenliang Dai
Zihan Liu
Pascale Fung
32
73
0
06 Sep 2021
Improved RAMEN: Towards Domain Generalization for Visual Question Answering
Bhanuka Gamage
Lim Chern Hong
22
1
0
06 Sep 2021
Learning to Generate Scene Graph from Natural Language Supervision
Yiwu Zhong
Jing Shi
Jianwei Yang
Chenliang Xu
Yin Li
SSL
42
77
0
06 Sep 2021
Data Efficient Masked Language Modeling for Vision and Language
Yonatan Bitton
Gabriel Stanovsky
Michael Elhadad
Roy Schwartz
VLM
11
20
0
05 Sep 2021
LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation
Mohammad Abuzar Shaikh
Zhanghexuan Ji
Dana Moukheiber
Yan Shen
S. Srihari
Mingchen Gao
VLM
22
1
0
04 Sep 2021
Multimodal Conditionality for Natural Language Generation
Michael Sollami
Aashish Jain
21
10
0
02 Sep 2021
WebQA: Multihop and Multimodal QA
Yingshan Chang
M. Narang
Hisami Suzuki
Guihong Cao
Jianfeng Gao
Yonatan Bisk
LRM
18
78
0
01 Sep 2021
Fine-Grained Chemical Entity Typing with Multimodal Knowledge Representation
Chenkai Sun
Weijian Li
Jinfeng Xiao
Nikolaus Nova Parulian
ChengXiang Zhai
Heng Ji
44
4
0
29 Aug 2021
On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering
K. Gouthaman
Anurag Mittal
CML
45
0
0
28 Aug 2021
Vision-Language Navigation: A Survey and Taxonomy
Wansen Wu
Tao Chang
Xinmeng Li
LM&Ro
17
19
0
26 Aug 2021
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
Hanbo Zhang
Yunfan Lu
Cunjun Yu
David Hsu
Xuguang Lan
Nanning Zheng
LM&Ro
29
63
0
25 Aug 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
51
781
0
24 Aug 2021
Guiding Query Position and Performing Similar Attention for Transformer-Based Detection Heads
Xiaohu Jiang
Ze Chen
Zhicheng Wang
Erjin Zhou
Chun Yuan
22
2
0
22 Aug 2021
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
Yuxin Wang
Hongtao Xie
Shancheng Fang
Jing Wang
Shenggao Zhu
Yongdong Zhang
VLM
58
152
0
22 Aug 2021
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Ming Yan
Haiyang Xu
Chenliang Li
Bin Bi
Junfeng Tian
Min Gui
Wei Wang
VLM
36
10
0
21 Aug 2021
Group-based Distinctive Image Captioning with Memory Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
21
18
0
20 Aug 2021
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur
Makarand Tapaswi
Shizhe Chen
Ivan Laptev
Cordelia Schmid
LM&Ro
27
135
0
20 Aug 2021
Knowledge Perceived Multi-modal Pretraining in E-commerce
Yushan Zhu
Huaixiao Tou
Wen Zhang
Ganqiang Ye
Hui Chen
Ningyu Zhang
Huajun Chen
28
32
0
20 Aug 2021
CIGLI: Conditional Image Generation from Language & Image
Xiaopeng Lu
Lynnette Hui Xian Ng
Jared Fernandez
Hao Zhu
DiffM
19
6
0
20 Aug 2021
Who's Waldo? Linking People Across Text and Images
Claire Yuqing Cui
Apoorv Khandelwal
Yoav Artzi
Noah Snavely
Hadar Averbuch-Elor
31
21
0
16 Aug 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
27
53
0
16 Aug 2021
Medical-VLBERT: Medical Visual Language BERT for COVID-19 CT Report Generation With Alternate Learning
Guangyi Liu
Yinghong Liao
Fuyu Wang
Bin Zhang
Lu Zhang
...
Xiang Wan
Shaolin Li
Zhen Li
Shuixing Zhang
Shuguang Cui
23
56
0
11 Aug 2021
BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis
Masoud Monajatipoor
Mozhdeh Rouhsedaghat
Liunian Harold Li
Aichi Chien
C.-C. Jay Kuo
Fabien Scalzo
Kai-Wei Chang
LM&MA
MedIm
24
30
0
10 Aug 2021
Reference-based Defect Detection Network
Zhaoyang Zeng
Bei Liu
Jianlong Fu
Hongyang Chao
ObjD
26
34
0
10 Aug 2021
Disentangling Hate in Online Memes
Rui Cao
Ziqing Fan
Roy Ka-wei Lee
Wen-Haw Chong
Jing Jiang
26
76
0
09 Aug 2021
OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning
Sheng Liu
Kevin Qinghong Lin
Lijuan Wang
Junsong Yuan
Zicheng Liu
VLM
8
3
0
08 Aug 2021
StrucTexT: Structured Text Understanding with Multi-Modal Transformers
Yulin Li
Yuxi Qian
Yuchen Yu
Xiameng Qin
Chengquan Zhang
Yan Liu
Kun Yao
Junyu Han
Jingtuo Liu
Errui Ding
27
114
0
06 Aug 2021
StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
Rinon Gal
Or Patashnik
Haggai Maron
Gal Chechik
Daniel Cohen-Or
CLIP
VLM
44
221
0
02 Aug 2021
Word2Pix: Word to Pixel Cross Attention Transformer in Visual Grounding
Heng Zhao
Qiufeng Wang
Yew-Soon Ong
ObjD
21
23
0
31 Jul 2021
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining
Xunlin Zhan
Yangxin Wu
Xiao Dong
Yunchao Wei
Minlong Lu
Yichi Zhang
Hang Xu
Xiaodan Liang
ViT
31
64
0
30 Jul 2021
UIBert: Learning Generic Multimodal Representations for UI Understanding
Chongyang Bai
Xiaoxue Zang
Ying Xu
Srinivas Sunkara
Abhinav Rastogi
Jindong Chen
Blaise Agüera y Arcas
19
87
0
29 Jul 2021
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
Pengfei Liu
Weizhe Yuan
Jinlan Fu
Zhengbao Jiang
Hiroaki Hayashi
Graham Neubig
VLM
SyDa
79
3,853
0
28 Jul 2021
Exceeding the Limits of Visual-Linguistic Multi-Task Learning
Cameron R. Wolfe
Keld T. Lundgaard
VLM
45
2
0
27 Jul 2021
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Yuren Cong
Wentong Liao
H. Ackermann
Bodo Rosenhahn
M. Yang
ViT
22
122
0
26 Jul 2021
Previous
1
2
3
...
14
15
16
...
19
20
21
Next