Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.03557
Cited By
VisualBERT: A Simple and Performant Baseline for Vision and Language
9 August 2019
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VisualBERT: A Simple and Performant Baseline for Vision and Language"
50 / 1,200 papers shown
Title
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations
Yufen Huang
Jiji Tang
Zhuo Chen
Rongsheng Zhang
Xinfeng Zhang
...
Zeng Zhao
Zhou Zhao
Tangjie Lv
Zhipeng Hu
Wen Zhang
VLM
125
25
0
06 May 2023
Fairness in Image Search: A Study of Occupational Stereotyping in Image Retrieval and its Debiasing
Swagatika Dash
38
0
0
06 May 2023
T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large Language Model Signals for Science Question Answering
Lei Wang
Yilang Hu
Jiabang He
Xingdong Xu
Ning Liu
Hui-juan Liu
Hengtao Shen
LRM
MLLM
116
48
0
05 May 2023
A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text
Yunxin Li
Baotian Hu
Yuxin Ding
Lin Ma
Hao Fei
74
5
0
03 May 2023
ArK: Augmented Reality with Knowledge Interactive Emergent Ability
Qiuyuan Huang
Jinho Park
Abhinav Gupta
Paul N. Bennett
Ran Gong
...
Baolin Peng
O. Mohammed
C. Pal
Yejin Choi
Jianfeng Gao
119
6
0
01 May 2023
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Peng Gao
Jiaming Han
Renrui Zhang
Ziyi Lin
Shijie Geng
...
Pan Lu
Conghui He
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
118
588
0
28 Apr 2023
An Empirical Study of Multimodal Model Merging
Yi-Lin Sung
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Joey Tianyi Zhou
Lijuan Wang
MoMe
118
42
0
28 Apr 2023
π
π
π
-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation
Chengyue Wu
Teng Wang
Yixiao Ge
Zeyu Lu
Rui-Zhi Zhou
Ying Shan
Ping Luo
MoMe
145
37
0
27 Apr 2023
Towards Multi-Modal DBMSs for Seamless Querying of Texts and Tables
Matthias Urban
Carsten Binnig
71
5
0
26 Apr 2023
Sample-Specific Debiasing for Better Image-Text Models
Peiqi Wang
Yingcheng Liu
Ching-Yun Ko
W. Wells
Seth Berkowitz
Steven Horng
Polina Golland
SSL
MedIm
111
1
0
25 Apr 2023
Hypernymization of named entity-rich captions for grounding-based multi-modal pretraining
Giacomo Nebbia
Adriana Kovashka
103
0
0
25 Apr 2023
SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery
Lalithkumar Seenivasan
Mobarakol Islam
Gokul Kannan
Hongliang Ren
86
43
0
19 Apr 2023
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Pan Lu
Baolin Peng
Hao Cheng
Michel Galley
Kai-Wei Chang
Ying Nian Wu
Song-Chun Zhu
Jianfeng Gao
KELM
MLLM
LRM
155
325
0
19 Apr 2023
SViTT: Temporal Learning of Sparse Video-Text Transformers
Yi Li
Kyle Min
Subarna Tripathi
Nuno Vasconcelos
63
13
0
18 Apr 2023
Learning Situation Hyper-Graphs for Video Question Answering
Aisha Urooj Khan
Hilde Kuehne
Bo Wu
Kim Chheu
Walid Bousselham
Chuang Gan
N. Lobo
M. Shah
90
16
0
18 Apr 2023
Towards Robust Prompts on Vision-Language Models
Jindong Gu
Ahmad Beirami
Xuezhi Wang
Alex Beutel
Philip Torr
Yao Qin
VLM
VPVLM
86
8
0
17 Apr 2023
Interpretable Detection of Out-of-Context Misinformation with Neural-Symbolic-Enhanced Large Multimodal Model
Yizhou Zhang
Loc Trinh
Defu Cao
Zijun Cui
Yang Liu
68
9
0
15 Apr 2023
CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Structure for Vision-Language Retrieval
Yang Yang
Zhongtian Fu
Xiangyu Wu
Wenjie Li
VLM
63
1
0
15 Apr 2023
TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation
Jingyao Li
Pengguang Chen
Shengju Qian
Jiaya Jia
VLM
80
13
0
15 Apr 2023
MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation
Jie Guo
Qimeng Wang
Yan Gao
Xiaolong Jiang
Xu Tang
Yao Hu
Baochang Zhang
VLM
77
11
0
14 Apr 2023
PDFVQA: A New Dataset for Real-World VQA on PDF Documents
Yihao Ding
Siwen Luo
Hyunsuk Chung
S. Han
103
18
0
13 Apr 2023
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language
Shentong Mo
Jingfei Xia
Ihor Markevych
CLIP
VLM
55
1
0
10 Apr 2023
Enhancing Multimodal Entity and Relation Extraction with Variational Information Bottleneck
Shiyao Cui
Jiangxia Cao
Xin Cong
Shuaiyi Nie
Quangang Li
Tingwen Liu
Jinqiao Shi
67
25
0
05 Apr 2023
G2PTL: A Pre-trained Model for Delivery Address and its Applications in Logistics System
Lixia Wu
Jianlin Liu
Junhong Lou
Haoyuan Hu
Jianbin Zheng
Haomin Wen
Chao Song
Shu He
VLM
66
5
0
04 Apr 2023
Self-Supervised Multimodal Learning: A Survey
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
125
50
0
31 Mar 2023
SemiMemes: A Semi-supervised Learning Approach for Multimodal Memes Analysis
Pham Thai Hoang Tung
Nguyen Tan Viet
Ngo Tien Anh
P. D. Hung
34
6
0
31 Mar 2023
Dual Cross-Attention for Medical Image Segmentation
Gorkem Can Ates
P. Mohan
Emrah Çelik
56
85
0
30 Mar 2023
SoftCLIP: Softer Cross-modal Alignment Makes CLIP Stronger
Yuting Gao
Jinfeng Liu
Zi-Han Xu
Tong Wu
Wen Liu
Jie Yang
Keren Li
Xingen Sun
CLIP
VLM
64
47
0
30 Mar 2023
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Lucas Beyer
Bo Wan
Gagan Madan
Filip Pavetić
Andreas Steiner
...
Emanuele Bugliarello
Tianlin Li
Qihang Yu
Liang-Chieh Chen
Xiaohua Zhai
130
9
0
30 Mar 2023
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models
Sifan Long
Zhen Zhao
Junkun Yuan
Zichang Tan
Jiangjiang Liu
Luping Zhou
Sheng-sheng Wang
Jingdong Wang
VLM
113
3
0
30 Mar 2023
Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations
VS Vibashan
Ning Yu
Chen Xing
Can Qin
M. Gao
Juan Carlos Niebles
Vishal M. Patel
Ran Xu
VLM
ISeg
78
18
0
29 Mar 2023
Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding
Yuanhao Xiong
Long Zhao
Boqing Gong
Ming-Hsuan Yang
Florian Schroff
Ting Liu
Cho-Jui Hsieh
Liangzhe Yuan
VLM
62
0
0
28 Mar 2023
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Renrui Zhang
Jiaming Han
Chris Liu
Peng Gao
Aojun Zhou
Xiangfei Hu
Shilin Yan
Pan Lu
Hongsheng Li
Yu Qiao
MLLM
179
787
0
28 Mar 2023
Egocentric Auditory Attention Localization in Conversations
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
70
16
0
28 Mar 2023
Curriculum Learning for Compositional Visual Reasoning
Wafa Aissa
Marin Ferecatu
M. Crucianu
LRM
82
3
0
27 Mar 2023
Equivariant Similarity for Vision-Language Foundation Models
Tan Wang
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Zhengyuan Yang
Hanwang Zhang
Zicheng Liu
Lijuan Wang
CoGe
83
51
0
25 Mar 2023
VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
Junjie Ke
Keren Ye
Jiahui Yu
Yonghui Wu
P. Milanfar
Feng Yang
VLM
102
61
0
24 Mar 2023
Accelerating Vision-Language Pretraining with Free Language Modeling
Teng Wang
Yixiao Ge
Feng Zheng
Ran Cheng
Ying Shan
Xiaohu Qie
Ping Luo
VLM
MLLM
113
10
0
24 Mar 2023
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Haoxuan You
Mandy Guo
Zhecan Wang
Kai-Wei Chang
Jason Baldridge
Jiahui Yu
DiffM
81
13
0
23 Mar 2023
VideoXum: Cross-modal Visual and Textural Summarization of Videos
Jingyang Lin
Hang Hua
Ming Chen
Yikang Li
Jenhao Hsiao
C. Ho
Jiebo Luo
106
33
0
21 Mar 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
167
48
0
21 Mar 2023
CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition
Deepti Hegde
Jeya Maria Jose Valanarasu
Vishal M. Patel
CLIP
118
68
0
20 Mar 2023
Label Name is Mantra: Unifying Point Cloud Segmentation across Heterogeneous Datasets
Yixun Liang
Hao He
Shishi Xiao
Hao Lu
Yingke Chen
3DPC
47
3
0
19 Mar 2023
DeAR: Debiasing Vision-Language Models with Additive Residuals
Ashish Seth
Mayur Hemani
Chirag Agarwal
VLM
62
56
0
18 Mar 2023
PersonalTailor: Personalizing 2D Pattern Design from 3D Garment Point Clouds
Sauradip Nag
Anran Qi
Xiatian Zhu
Ariel Shamir
3DPC
65
7
0
17 Mar 2023
MultiModal Bias: Introducing a Framework for Stereotypical Bias Assessment beyond Gender and Race in Vision Language Models
Sepehr Janghorbani
Gerard de Melo
VLM
103
12
0
16 Mar 2023
Scaling Vision-Language Models with Sparse Mixture of Experts
Sheng Shen
Z. Yao
Chunyuan Li
Trevor Darrell
Kurt Keutzer
Yuxiong He
VLM
MoE
77
68
0
13 Mar 2023
Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning
Qian Jiang
Changyou Chen
Han Zhao
Liqun Chen
Q. Ping
S. D. Tran
Yi Xu
Belinda Zeng
Trishul Chilimbi
97
43
0
10 Mar 2023
Refined Vision-Language Modeling for Fine-grained Multi-modal Pre-training
Lisai Zhang
Qingcai Chen
Zhijian Chen
Yunpeng Han
Zhonghua Li
Bo Zhao
VLM
57
1
0
09 Mar 2023
Toward Unsupervised Realistic Visual Question Answering
Yuwei Zhang
Chih-Hui Ho
Nuno Vasconcelos
CoGe
85
2
0
09 Mar 2023
Previous
1
2
3
...
10
11
12
...
22
23
24
Next