Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.08530
Cited By
v1
v2
v3
v4 (latest)
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
22 August 2019
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (740★)
Papers citing
"VL-BERT: Pre-training of Generic Visual-Linguistic Representations"
50 / 1,020 papers shown
Title
Dealing with Semantic Underspecification in Multimodal NLP
Sandro Pezzelle
63
10
0
08 Jun 2023
Object Detection with Transformers: A Review
Tahira Shehzadi
K. Hashmi
D. Stricker
Muhammad Zeshan Afzal
ViT
MU
102
29
0
07 Jun 2023
Learning to Ground Instructional Articles in Videos through Narrations
E. Mavroudi
Triantafyllos Afouras
Lorenzo Torresani
DiffM
85
24
0
06 Jun 2023
MolFM: A Multimodal Molecular Foundation Model
Yi Luo
Kai Yang
Massimo Hong
Xingyi Liu
Zaiqing Nie
78
39
0
06 Jun 2023
Diversifying Joint Vision-Language Tokenization Learning
Vardaan Pahuja
A. Piergiovanni
A. Angelova
71
0
0
06 Jun 2023
Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes
Alexandros Delitzas
Maria Parelli
Nikolas Hars
G. Vlassis
Sotiris Anagnostidis
Gregor Bachmann
Thomas Hofmann
CLIP
53
22
0
04 Jun 2023
Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models
Hidetaka Kamigaito
Katsuhiko Hayashi
Taro Watanabe
VLM
63
1
0
03 Jun 2023
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models
Shuo Chen
Jindong Gu
Zhen Han
Yunpu Ma
Philip Torr
Volker Tresp
VPVLM
VLM
127
21
0
03 Jun 2023
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Qiangchang Wang
Yilong Yin
98
0
0
02 Jun 2023
"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning
Abisek Rajakumar Kalarani
P. Bhattacharyya
Niyati Chhaya
Sumit Shekhar
CoGe
VLM
111
9
0
01 Jun 2023
A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics
Hong-Yu Zhou
Yizhou Yu
Chengdi Wang
Shu Zhen Zhang
Yuanxu Gao
Jia Pan
Jun Shao
Guangming Lu
Kang Zhang
Weimin Li
MedIm
91
171
0
01 Jun 2023
Prompt Algebra for Task Composition
Pramuditha Perera
Matthew Trager
Luca Zancato
Alessandro Achille
Stefano Soatto
VLM
77
8
0
01 Jun 2023
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task
Ning Ding
Yehui Tang
Zhongqian Fu
Chaoting Xu
Kai Han
Yunhe Wang
MLLM
VLM
49
2
0
01 Jun 2023
ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Xiao Xu
Bei Li
Chenfei Wu
Shao-Yen Tseng
Anahita Bhiwandiwalla
Shachar Rosenman
Vasudev Lal
Wanxiang Che
Nan Duan
AIFin
VLM
70
4
0
31 May 2023
Joint Adaptive Representations for Image-Language Learning
A. Piergiovanni
A. Angelova
VLM
76
0
0
31 May 2023
Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs
Mingyang Zhou
Yi R. Fung
Long Chen
Christopher Thomas
Heng Ji
Shih-Fu Chang
105
13
0
29 May 2023
Deeply Coupled Cross-Modal Prompt Learning
Xuejing Liu
Wei Tang
Jinghui Lu
Rui Zhao
Zhaojun Guo
Fei Tan
VLM
61
17
0
29 May 2023
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
Noam Rotstein
David Bensaid
Shaked Brody
Roy Ganz
Ron Kimmel
VLM
81
31
0
28 May 2023
MemeGraphs: Linking Memes to Knowledge Graphs
Vasiliki Kougia
Simon Fetzel
Thomas Kirchmair
Erion cCano
Sina Moayed Baharlou
Sahand Sharifzadeh
Benjamin Roth
79
11
0
28 May 2023
Learning to Imagine: Visually-Augmented Natural Language Generation
Tianyi Tang
Yushuo Chen
Yifan Du
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
DiffM
85
9
0
26 May 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Chia-Wen Kuo
Z. Kira
83
23
0
25 May 2023
MMNet: Multi-Mask Network for Referring Image Segmentation
Yimin Yan
Xingjian He
Wenxuan Wan
Qingbin Liu
EgoV
62
2
0
24 May 2023
Meta-learning For Vision-and-language Cross-lingual Transfer
Hanxu Hu
Frank Keller
VLM
78
2
0
24 May 2023
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
Ahmed Masry
P. Kavehzadeh
Do Xuan Long
Enamul Hoque
Shafiq Joty
LRM
95
113
0
24 May 2023
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions
Woojeong Jin
Subhabrata Mukherjee
Yu Cheng
Yelong Shen
Weizhu Chen
Ahmed Hassan Awadallah
Damien Jose
Xiang Ren
ObjD
VLM
116
8
0
24 May 2023
RE
2
^2
2
: Region-Aware Relation Extraction from Visually Rich Documents
Pritika Ramu
Sijia Wang
Lalla Mouatadid
Joy Rimchala
Lifu Huang
52
0
0
24 May 2023
Run Like a Girl! Sports-Related Gender Bias in Language and Vision
S. Harrison
Eleonora Gualdoni
Gemma Boleda
47
6
0
23 May 2023
Training Transitive and Commutative Multimodal Transformers with LoReTTa
Manuel Tran
Yashin Dicente Cid
Amal Lahiani
Fabian J. Theis
Tingying Peng
Eldad Klaiman
54
2
0
23 May 2023
Can Language Models Understand Physical Concepts?
Lei Li
Jingjing Xu
Qingxiu Dong
Ce Zheng
Qi Liu
Lingpeng Kong
Xu Sun
ALM
61
22
0
23 May 2023
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Ziyi Yang
Mahmoud Khademi
Yichong Xu
Reid Pryzant
Yuwei Fang
...
Yu Shi
Lu Yuan
Takuya Yoshioka
Michael Zeng
Xuedong Huang
63
2
0
21 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
151
122
0
18 May 2023
Inspecting the Geographical Representativeness of Images from Text-to-Image Models
Aparna Basu
R. Venkatesh Babu
Danish Pruthi
DiffM
120
40
0
18 May 2023
Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding
Zhang Tao
Su He
D. Tao
Bin Chen
Zhi Wang
Shutao Xia
VLM
82
27
0
18 May 2023
Rethinking Multimodal Content Moderation from an Asymmetric Angle with Mixed-modality
Jialing Yuan
Ye Yu
Gaurav Mittal
Matthew Hall
Sandra Sajeev
Mei Chen
93
10
0
17 May 2023
An Empirical Study on the Language Modal in Visual Question Answering
Daowan Peng
Wei Wei
Xian-Ling Mao
Yuanyuan Fu
Dangyang Chen
75
4
0
17 May 2023
Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding
ShuWei Feng
Tianyang Zhan
Zhanming Jie
Trung Quoc Luong
Xiaoran Jin
49
1
0
16 May 2023
Mobile User Interface Element Detection Via Adaptively Prompt Tuning
Zhangxuan Gu
Zhuoer Xu
Haoxing Chen
Jun Lan
Changhua Meng
Weiqiang Wang
47
4
0
16 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
102
100
0
14 May 2023
RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training
Chulun Zhou
Yunlong Liang
Fandong Meng
Jinan Xu
Jinsong Su
Jie Zhou
VLM
66
4
0
13 May 2023
Towards Versatile and Efficient Visual Knowledge Integration into Pre-trained Language Models with Cross-Modal Adapters
Xinyun Zhang
Haochen Tan
Han Wu
Bei Yu
KELM
36
1
0
12 May 2023
Bot or Human? Detecting ChatGPT Imposters with A Single Question
Hong Wang
Xuan Luo
Weizhi Wang
Xifeng Yan
DeLMO
67
27
0
10 May 2023
A Review of Vision-Language Models and their Performance on the Hateful Memes Challenge
Bryan Zhao
Andrew Zhang
Blake Watson
Gillian Kearney
Isaac Dale
VLM
31
4
0
09 May 2023
SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign Language Understanding
Hezhen Hu
Weichao Zhao
Wen-gang Zhou
Houqiang Li
ViT
95
74
0
08 May 2023
Scene Text Recognition with Image-Text Matching-guided Dictionary
Jiajun Wei
Hongjian Zhan
X. Tu
Yue Lu
Umapada Pal
VLM
41
0
0
08 May 2023
OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese
Nghia Hieu Nguyen
Duong T.D. Vo
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
73
20
0
07 May 2023
A Large Cross-Modal Video Retrieval Dataset with Reading Comprehension
Weijia Wu
Yuzhong Zhao
Zhuangzi Li
Jiahong Li
Hong Zhou
Mike Zheng Shou
Xiang Bai
82
22
0
05 May 2023
VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation
Xilun Chen
L. Yu
Wenhan Xiong
Barlas Ouguz
Yashar Mehdad
Wen-tau Yih
VGen
58
3
0
04 May 2023
Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime
Chuhan Zhang
Antoine Miech
Jiajun Shen
Jean-Baptiste Alayrac
Pauline Luc
VLM
VPVLM
90
2
0
03 May 2023
VPGTrans: Transfer Visual Prompt Generator across LLMs
Ao Zhang
Hao Fei
Yuan Yao
Wei Ji
Li Li
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
84
89
0
02 May 2023
In-Context Learning Unlocked for Diffusion Models
Zhendong Wang
Yi Ding
Yadong Lu
Yelong Shen
Pengcheng He
Weizhu Chen
Zhangyang Wang
Mingyuan Zhou
VLM
DiffM
150
78
0
01 May 2023
Previous
1
2
3
...
5
6
7
...
19
20
21
Next