Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.03557
Cited By
VisualBERT: A Simple and Performant Baseline for Vision and Language
9 August 2019
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VisualBERT: A Simple and Performant Baseline for Vision and Language"
50 / 1,200 papers shown
Title
LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models
Ying Nie
Wei He
Kai Han
Yehui Tang
Tianyu Guo
Fanyi Du
Yunhe Wang
VLM
86
4
0
01 Dec 2023
Semantic-Aware Frame-Event Fusion based Pattern Recognition via Large Vision-Language Models
Dong Li
Jiandong Jin
Yuhao Zhang
Yanlin Zhong
Yaoyang Wu
Lan Chen
Tianlin Li
Bin Luo
117
6
0
30 Nov 2023
Leveraging VLM-Based Pipelines to Annotate 3D Objects
Rishabh Kabra
Loic Matthey
Alexander Lerchner
Niloy J. Mitra
113
6
0
29 Nov 2023
Contrastive Vision-Language Alignment Makes Efficient Instruction Learner
Lizhao Liu
Xinyu Sun
Tianhang Xiang
Zhuangwei Zhuang
Liuren Yin
Mingkui Tan
VLM
60
3
0
29 Nov 2023
PALM: Predicting Actions through Language Models
Sanghwan Kim
Daoji Huang
Yongqin Xian
Otmar Hilliges
Luc Van Gool
Xi Wang
VLM
81
14
0
29 Nov 2023
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Sicong Leng
Hang Zhang
Guanzheng Chen
Xin Li
Shijian Lu
Chunyan Miao
Li Bing
VLM
MLLM
153
239
0
28 Nov 2023
C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing
Avigyan Bhattacharya
Mainak Singha
Ankit Jha
Biplab Banerjee
SSL
VLM
80
6
0
27 Nov 2023
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
Yufei Zhan
Yousong Zhu
Zhiyang Chen
Fan Yang
E. Goles
Jinqiao Wang
ObjD
114
17
0
24 Nov 2023
Vamos: Versatile Action Models for Video Understanding
Shijie Wang
Qi Zhao
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
143
21
0
22 Nov 2023
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation
Yangyi Chen
Xingyao Wang
Manling Li
Derek Hoiem
Heng Ji
81
12
0
22 Nov 2023
A Survey on Multimodal Large Language Models for Autonomous Driving
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
...
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
169
290
0
21 Nov 2023
PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction
Peng Wang
Hao Tan
Sai Bi
Yinghao Xu
Fujun Luan
Kalyan Sunkavalli
Wenping Wang
Zexiang Xu
Kai Zhang
99
109
0
20 Nov 2023
RecExplainer: Aligning Large Language Models for Explaining Recommendation Models
Yuxuan Lei
Jianxun Lian
Jing Yao
Xu Huang
Defu Lian
Xing Xie
LRM
68
9
0
18 Nov 2023
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
Yangyi Chen
Karan Sikka
Michael Cogswell
Heng Ji
Ajay Divakaran
131
72
0
16 Nov 2023
Improving Hateful Meme Detection through Retrieval-Guided Contrastive Learning
Jingbiao Mei
Jinghong Chen
Weizhe Lin
Bill Byrne
Marcus Tomalin
VLM
64
8
0
14 Nov 2023
Learning Mutually Informed Representations for Characters and Subwords
Yilin Wang
Xinyi Hu
Matthew R. Gormley
68
0
0
14 Nov 2023
Detecting and Correcting Hate Speech in Multimodal Memes with Large Visual Language Model
Minh-Hao Van
Xintao Wu
VLM
MLLM
65
11
0
12 Nov 2023
MuST: Multimodal Spatiotemporal Graph-Transformer for Hospital Readmission Prediction
Yan Miao
Lequan Yu
26
2
0
11 Nov 2023
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
Cheng Yang
Rui Xu
Ye Guo
Peixiang Huang
Yiru Chen
Wenkui Ding
Zhongyuan Wang
Hong Zhou
LRM
59
6
0
09 Nov 2023
Self-Supervised Learning for Visual Relationship Detection through Masked Bounding Box Reconstruction
Zacharias Anastasakis
Dimitrios Mallis
Markos Diomataris
George Alexandridis
Stefanos D. Kollias
Vassilis Pitsikalis
57
2
0
08 Nov 2023
Meta-Adapter: An Online Few-shot Learner for Vision-Language Model
Cheng Cheng
Lin Song
Ruoyi Xue
Hang Wang
Hongbin Sun
Yixiao Ge
Ying Shan
VLM
ObjD
116
26
0
07 Nov 2023
A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose Estimation
Qi-jun Zhao
Ce Zheng
Mengyuan Liu
Chong Chen
72
14
0
06 Nov 2023
Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language Models
Jingru Yi
Burak Uzkent
Oana Ignat
Zili Li
Amanmeet Garg
Xiang Yu
Linda Liu
VLM
78
1
0
05 Nov 2023
Emotion Detection for Misinformation: A Review
Zhiwei Liu
Tianlin Zhang
Kailai Yang
Paul Thompson
Zeping Yu
Sophia Ananiadou
110
35
0
01 Nov 2023
Form follows Function: Text-to-Text Conditional Graph Generation based on Functional Requirements
Peter Zachares
Vahan Hovhannisyan
Alan Mosca
Yarin Gal
59
1
0
01 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
151
44
0
01 Nov 2023
Neuroformer: Multimodal and Multitask Generative Pretraining for Brain Data
Antonis Antoniades
Yiyi Yu
Joseph Canzano
William Wang
Spencer L. Smith
AI4CE
120
13
0
31 Oct 2023
Partial Tensorized Transformers for Natural Language Processing
Subhadra Vadlamannati
Ryan Solgi
55
0
0
30 Oct 2023
Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone
Zeyinzi Jiang
Chaojie Mao
Ziyuan Huang
Ao Ma
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
88
16
0
30 Oct 2023
ArchBERT: Bi-Modal Understanding of Neural Architectures and Natural Languages
Mohammad Akbari
Saeed Ranjbar Alvar
Behnam Kamranian
Amin Banitalebi-Dehkordi
Yong Zhang
AI4CE
36
0
0
26 Oct 2023
A Survey on Transferability of Adversarial Examples across Deep Neural Networks
Jindong Gu
Xiaojun Jia
Pau de Jorge
Wenqain Yu
Xinwei Liu
...
Anjun Hu
Ashkan Khakzar
Zhijiang Li
Xiaochun Cao
Philip Torr
AAML
116
31
0
26 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
80
10
0
25 Oct 2023
V
D
\mathbb{VD}
VD
-
G
R
\mathbb{GR}
GR
: Boosting
V
\mathbb{V}
V
isual
D
\mathbb{D}
D
ialog with Cascaded Spatial-Temporal Multi-Modal
G
R
\mathbb{GR}
GR
aphs
Adnen Abdessaied
Lei Shi
Andreas Bulling
3DH
58
4
0
25 Oct 2023
Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal Transformer
Namkyeong Lee
Heewoong Noh
Sungwon Kim
Dongmin Hyun
Gyoung S. Na
Chanyoung Park
54
6
0
24 Oct 2023
Multimodal Representations for Teacher-Guided Compositional Visual Reasoning
Wafa Aissa
Marin Ferecatu
M. Crucianu
LRM
70
0
0
24 Oct 2023
Learning with Noisy Labels Using Collaborative Sample Selection and Contrastive Semi-Supervised Learning
Qing Miao
Xiaohe Wu
Chao Xu
Yanli Ji
Wangmeng Zuo
Yiwen Guo
Zhaopeng Meng
NoLa
85
5
0
24 Oct 2023
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen
Bo Li
Sheng Shen
Jingkang Yang
Chunyuan Li
Kurt Keutzer
Trevor Darrell
Ziwei Liu
VLM
LRM
130
58
0
23 Oct 2023
The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models
Xinyi Chen
Raquel Fernández
Sandro Pezzelle
VLM
62
10
0
23 Oct 2023
OV-VG: A Benchmark for Open-Vocabulary Visual Grounding
Chunlei Wang
Wenquan Feng
Xiangtai Li
Guangliang Cheng
Shuchang Lyu
Binghao Liu
Lijiang Chen
Qi Zhao
ObjD
VLM
96
11
0
22 Oct 2023
ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
Baohao Liao
Michael Kozielski
Sanjika Hewavitharana
Jiangbo Yuan
Shahram Khadivi
Tomer Lancewicki
SSL
25
0
0
22 Oct 2023
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering
Yuduo Wang
Pedram Ghamisi
61
6
0
19 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
166
123
0
16 Oct 2023
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
Yangyang Guo
Guangzhi Wang
Mohan S. Kankanhalli
41
3
0
16 Oct 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
68
2
0
15 Oct 2023
JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues
Jiayi Ji
Haowei Wang
Changli Wu
Yiwei Ma
Xiaoshuai Sun
Rongrong Ji
103
1
0
14 Oct 2023
Mapping Memes to Words for Multimodal Hateful Meme Classification
Giovanni Burbi
Alberto Baldrati
Lorenzo Agnolucci
Marco Bertini
A. Bimbo
61
19
0
12 Oct 2023
Open-Set Knowledge-Based Visual Question Answering with Inference Paths
Jingru Gan
Xinzhe Han
Shuhui Wang
Qingming Huang
81
0
0
12 Oct 2023
Jaeger: A Concatenation-Based Multi-Transformer VQA Model
Jieting Long
Zewei Shi
Penghao Jiang
Yidong Gan
53
0
0
11 Oct 2023
MemSum-DQA: Adapting An Efficient Long Document Extractive Summarizer for Document Question Answering
Nianlong Gu
Yingqiang Gao
Richard H. R. Hahnloser
RALM
77
0
0
10 Oct 2023
I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal Information Extraction
Yusheng Huang
Zhouhan Lin
57
5
0
10 Oct 2023
Previous
1
2
3
...
6
7
8
...
22
23
24
Next