Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.03557
Cited By
VisualBERT: A Simple and Performant Baseline for Vision and Language
9 August 2019
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VisualBERT: A Simple and Performant Baseline for Vision and Language"
50 / 1,200 papers shown
Title
Contextual Chart Generation for Cyber Deception
David D. Nguyen
David Liebowitz
Surya Nepal
S. Kanhere
Sharif Abuadbba
99
0
0
07 Apr 2024
Vision Transformers in Domain Adaptation and Generalization: A Study of Robustness
Shadi Alijani
Jamil Fayyad
Homayoun Najjaran
OOD
114
1
0
05 Apr 2024
DeViDe: Faceted medical knowledge for improved medical vision-language pre-training
Haozhe Luo
Ziyu Zhou
Corentin Royer
Anjany Sekuboyina
Bjoern Menze
VLM
ViT
MedIm
101
7
0
04 Apr 2024
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
Kirolos Ataallah
Xiaoqian Shen
Eslam Abdelrahman
Essam Sleiman
Deyao Zhu
Jian Ding
Mohamed Elhoseiny
VLM
99
79
0
04 Apr 2024
Cross-Modality Gait Recognition: Bridging LiDAR and Camera Modalities for Human Identification
Rui Wang
Chuanfu Shen
M. Marín-Jiménez
George Q. Huang
Shiqi Yu
CVBM
100
6
0
04 Apr 2024
BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes
Amirhossein Abaskohi
AmirHossein Dabiri Aghdam
Lele Wang
Giuseppe Carenini
78
1
0
03 Apr 2024
Bi-LORA: A Vision-Language Approach for Synthetic Image Detection
Mamadou Keita
W. Hamidouche
Hessen Bougueffa Eutamene
Abdenour Hadid
Abdelmalik Taleb-Ahmed
110
9
0
02 Apr 2024
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
VGen
85
1
0
01 Apr 2024
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
105
6
0
01 Apr 2024
Unknown Prompt, the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization
Mainak Singha
Ankit Jha
Shirsha Bose
Ashwin Nair
Moloud Abdar
Biplab Banerjee
VLM
98
12
0
31 Mar 2024
Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations
Jaisidh Singh
Ishaan Shrivastava
Mayank Vatsa
Richa Singh
Aparna Bharati
VLM
CoGe
86
20
0
29 Mar 2024
FSMR: A Feature Swapping Multi-modal Reasoning Approach with Joint Textual and Visual Clues
Shuang Li
Jiahua Wang
Lijie Wen
LRM
53
0
0
29 Mar 2024
Semantic Map-based Generation of Navigation Instructions
Chengzu Li
Chao Zhang
Simone Teufel
R. Doddipatla
Svetlana Stoyanchev
73
2
0
28 Mar 2024
Scaling Vision-and-Language Navigation With Offline RL
Valay Bundele
Mahesh Bhupati
Biplab Banerjee
Aditya Grover
OffRL
47
1
0
27 Mar 2024
Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation Enhancement
Yuxuan Wang
Xiaoyuan Liu
VLM
75
0
0
24 Mar 2024
Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models
Qiong Wu
Weihao Ye
Yiyi Zhou
Xiaoshuai Sun
Rongrong Ji
MoE
84
1
0
22 Mar 2024
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
Guan-Feng Wang
Long Bai
Wan Jun Nah
Jie Wang
Zhaoxi Zhang
Zhen Chen
Jinlin Wu
Mobarakol Islam
Hongbin Liu
Hongliang Ren
129
17
0
22 Mar 2024
Grounding Spatial Relations in Text-Only Language Models
Gorka Azkune
Ander Salaberria
Eneko Agirre
59
0
0
20 Mar 2024
As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?
Anjun Hu
Jindong Gu
Francesco Pinto
Konstantinos Kamnitsas
Philip Torr
AAML
SILM
86
5
0
19 Mar 2024
Modality-Agnostic fMRI Decoding of Vision and Language
Mitja Nikolaus
Milad Mozafari
Nicholas Asher
Leila Reddy
Rufin VanRullen
78
4
0
18 Mar 2024
Prioritized Semantic Learning for Zero-shot Instance Navigation
Xander Sun
Louis Lau
Hoyard Zhi
Ronghe Qiu
Junwei Liang
82
11
0
18 Mar 2024
Deciphering Hate: Identifying Hateful Memes and Their Targets
E. Hossain
Omar Sharif
M. M. Hoque
S. Preum
80
6
0
16 Mar 2024
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
Enguang Wang
Zhimao Peng
Zhengyuan Xie
Fei Yang
Xialei Liu
Ming-Ming Cheng
135
3
0
15 Mar 2024
PosSAM: Panoptic Open-vocabulary Segment Anything
VS Vibashan
Shubhankar Borse
Hyojin Park
Debasmit Das
Vishal M. Patel
Munawar Hayat
Fatih Porikli
VLM
MLLM
75
7
0
14 Mar 2024
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Brandon McKinzie
Zhe Gan
J. Fauconnier
Sam Dodge
Bowen Zhang
...
Zirui Wang
Ruoming Pang
Peter Grasch
Alexander Toshev
Yinfei Yang
MLLM
123
208
0
14 Mar 2024
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Yufei Zhan
Yousong Zhu
Hongyin Zhao
Fan Yang
Ming Tang
Jinqiao Wang
ObjD
98
14
0
14 Mar 2024
Generative Models and Connected and Automated Vehicles: A Survey in Exploring the Intersection of Transportation and AI
Dong Shu
Zhouyao Zhu
117
1
0
14 Mar 2024
Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification
Long Lan
Fengxiang Wang
Shuyan Li
Xiangtao Zheng
Zengmao Wang
Xinwang Liu
VLM
84
9
0
13 Mar 2024
Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost
Oana Ignat
Longju Bai
Joan Nwatu
Rada Mihalcea
76
6
0
12 Mar 2024
Noise-powered Multi-modal Knowledge Graph Representation Framework
Zhuo Chen
Yin Fang
Yichi Zhang
Lingbing Guo
Jiaoyan Chen
Hua-zeng Chen
Wen Zhang
Wen Zhang
50
0
0
11 Mar 2024
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
Peng Qi
Zehong Yan
Wynne Hsu
Mong Li Lee
MLLM
131
46
0
05 Mar 2024
Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review
Iryna Hartsock
Ghulam Rasool
102
81
0
04 Mar 2024
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
Zhaorun Chen
Zhuokai Zhao
Hongyin Luo
Huaxiu Yao
Bo Li
Jiawei Zhou
MLLM
121
75
0
01 Mar 2024
Acquiring Linguistic Knowledge from Multimodal Input
Theodor Amariucai
Alexander Scott Warstadt
CLL
89
2
0
27 Feb 2024
Vision Transformers with Natural Language Semantics
Young-Kyung Kim
Matías Di Martino
Guillermo Sapiro
ViT
58
5
0
27 Feb 2024
Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning
Maurits J. R. Bleeker
Mariya Hendriksen
Andrew Yates
Maarten de Rijke
VLM
97
2
0
27 Feb 2024
CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification
Zihang Jiang
Qingsong Yao
Zihang Jiang
Rongsheng Wang
Zhiyang He
Xiaodong Tao
S. Kevin Zhou
MedIm
96
16
0
27 Feb 2024
ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking
Yushan Han
Kaer Huang
78
1
0
27 Feb 2024
How Can LLM Guide RL? A Value-Based Approach
Shenao Zhang
Sirui Zheng
Shuqi Ke
Zhihan Liu
Wanxin Jin
Jianbo Yuan
Yingxiang Yang
Hongxia Yang
Zhaoran Wang
71
9
0
25 Feb 2024
NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation
JIazhao Zhang
Kunyu Wang
Rongtao Xu
Gengze Zhou
Yicong Hong
Xiaomeng Fang
Qi Wu
Zhizheng Zhang
Wang He
LM&Ro
161
61
0
24 Feb 2024
CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora
Zijun Long
Xuri Ge
R. McCreadie
Joemon M. Jose
73
7
0
23 Feb 2024
Efficient data selection employing Semantic Similarity-based Graph Structures for model training
Roxana Petcu
Subhadeep Maji
28
1
0
22 Feb 2024
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRM
VLM
131
64
0
19 Feb 2024
Strong hallucinations from negation and how to fix them
Nicholas Asher
Swarnadeep Bhar
ReLM
LRM
54
5
0
16 Feb 2024
MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding
Hai-Tao Yu
Mofei Song
3DPC
52
9
0
15 Feb 2024
Align before Attend: Aligning Visual and Textual Features for Multimodal Hateful Content Detection
E. Hossain
Omar Sharif
M. M. Hoque
S. Preum
71
4
0
15 Feb 2024
ProtChatGPT: Towards Understanding Proteins with Large Language Models
Chao Wang
Hehe Fan
Ruijie Quan
Yi Yang
108
16
0
15 Feb 2024
Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing?
Tiantian Feng
Daniel Yang
Digbalay Bose
Shrikanth Narayanan
100
6
0
14 Feb 2024
Asking Multimodal Clarifying Questions in Mixed-Initiative Conversational Search
Yifei Yuan
Clemencia Siro
Mohammad Aliannejadi
Maarten de Rijke
Wai Lam
61
10
0
12 Feb 2024
RA-Rec: An Efficient ID Representation Alignment Framework for LLM-based Recommendation
Xiaohan Yu
Li Zhang
Xin Zhao
Yue Wang
Zhongrui Ma
72
11
0
07 Feb 2024
Previous
1
2
3
4
5
6
...
22
23
24
Next