Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00468
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
VQA: Visual Question Answering
3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VQA: Visual Question Answering"
50 / 2,957 papers shown
Title
Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder
Huiwon Jang
Jihoon Tack
Daewon Choi
Jongheon Jeong
Jinwoo Shin
74
3
0
25 Oct 2023
Knowledge Editing for Large Language Models: A Survey
Song Wang
Yaochen Zhu
Haochen Liu
Zaiyi Zheng
Chen Chen
Wenlin Yao
KELM
176
163
0
24 Oct 2023
FloCoDe: Unbiased Dynamic Scene Graph Generation with Temporal Consistency and Correlation Debiasing
Anant Khandelwal
130
2
0
24 Oct 2023
CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought Language Prompting
Lei Li
115
24
0
24 Oct 2023
Emergent Communication in Interactive Sketch Question Answering
Zixing Lei
Yiming Zhang
Yuxin Xiong
Siheng Chen
83
2
0
24 Oct 2023
LXMERT Model Compression for Visual Question Answering
Maryam Hashemi
Ghazaleh Mahmoudi
Sara Kodeiri
Hadi Sheikhi
Sauleh Eetemadi
VLM
34
4
0
23 Oct 2023
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen
Bo Li
Sheng Shen
Jingkang Yang
Chunyuan Li
Kurt Keutzer
Trevor Darrell
Ziwei Liu
VLM
LRM
130
58
0
23 Oct 2023
Large Language Models can Share Images, Too!
Young-Jun Lee
Dokyong Lee
Joo Won Sung
Jonghwan Hyeon
Ho-Jin Choi
MLLM
84
2
0
23 Oct 2023
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Zhecan Wang
Long Chen
Haoxuan You
Keyang Xu
Yicheng He
Wenhao Li
Noal Codella
Kai-Wei Chang
Shih-Fu Chang
107
3
0
23 Oct 2023
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan
Fuxiao Liu
Xiyang Wu
Ruiqi Xian
Zongxia Li
...
Lichang Chen
Furong Huang
Yaser Yacoob
Dinesh Manocha
Dinesh Manocha
VLM
MLLM
165
196
0
23 Oct 2023
ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
Baohao Liao
Michael Kozielski
Sanjika Hewavitharana
Jiangbo Yuan
Shahram Khadivi
Tomer Lancewicki
SSL
25
0
0
22 Oct 2023
Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Mingwei Zhu
Leigang Sha
Yu Shu
Kangjia Zhao
Tiancheng Zhao
Yuxiang Cai
LRM
96
1
0
20 Oct 2023
On the Language Encoder of Contrastive Cross-modal Models
Mengjie Zhao
Junya Ono
Zhi-Wei Zhong
Chieh-Hsin Lai
Yuhta Takida
Naoki Murata
Wei-Hsiang Liao
Takashi Shibuya
Hiromi Wakaki
Yuki Mitsufuji
VLM
63
0
0
20 Oct 2023
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering
Yuduo Wang
Pedram Ghamisi
63
6
0
19 Oct 2023
Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling
Yaqing Wang
Jialin Wu
T. Dabral
Jiageng Zhang
Geoff Brown
...
Frederick Liu
Yi Liang
Bo Pang
Michael Bendersky
Radu Soricut
VLM
81
15
0
18 Oct 2023
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
Yanyang Guo
Fangkai Jiao
Zhiqi Shen
Liqiang Nie
Mohan S. Kankanhalli
MLLM
87
7
0
17 Oct 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
68
2
0
15 Oct 2023
Progressive Evidence Refinement for Open-domain Multimodal Retrieval Question Answering
Shuwen Yang
Anran Wu
Xingjiao Wu
Luwei Xiao
Tianlong Ma
Cheng Jin
Liang He
69
4
0
15 Oct 2023
VizAbility: Enhancing Chart Accessibility with LLM-based Conversational Interaction
Joshua Gorniak
Yoon Kim
Donglai Wei
Nam Wook Kim
82
10
0
14 Oct 2023
Vision-by-Language for Training-Free Compositional Image Retrieval
Shyamgopal Karthik
Karsten Roth
Massimiliano Mancini
Zeynep Akata
CoGe
115
61
0
13 Oct 2023
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Dongsheng Jiang
Yuchen Liu
Songlin Liu
Jiné Zhao
Hao Zhang
Zhen Gao
Xiaopeng Zhang
Jin Li
Hongkai Xiong
MLLM
VLM
81
39
0
13 Oct 2023
Can We Edit Multimodal Large Language Models?
Siyuan Cheng
Bo Tian
Qingbin Liu
Xi Chen
Yongheng Wang
Huajun Chen
Ningyu Zhang
MLLM
117
29
0
12 Oct 2023
Visual Question Generation in Bengali
Mahmud Hasan
Labiba Islam
J. Ruma
T. Mayeesha
Rashedur Rahman
74
1
0
12 Oct 2023
Open-Set Knowledge-Based Visual Question Answering with Inference Paths
Jingru Gan
Xinzhe Han
Shuhui Wang
Qingming Huang
81
0
0
12 Oct 2023
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
Yueming Lyu
Kang Zhao
Bo Peng
H. Chen
Yue Jiang
Yingya Zhang
Jing Dong
Caifeng Shan
83
2
0
12 Oct 2023
CLIP for Lightweight Semantic Segmentation
Ke Jin
Wankou Yang
VLM
91
1
0
11 Oct 2023
Improving mitosis detection on histopathology images using large vision-language models
Ruiwen Ding
James Hall
Neil Tenenholtz
Kristen Severson
VLM
74
6
0
11 Oct 2023
Jaeger: A Concatenation-Based Multi-Transformer VQA Model
Jieting Long
Zewei Shi
Penghao Jiang
Yidong Gan
53
0
0
11 Oct 2023
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
Letian Zhang
Xiaotong Zhai
Zhongkai Zhao
Yongshuo Zong
Xin Wen
Bingchen Zhao
LRM
38
0
0
10 Oct 2023
What Makes for Robust Multi-Modal Models in the Face of Missing Modalities?
Siting Li
Chenzhuang Du
Yue Zhao
Yu Huang
Hang Zhao
59
4
0
10 Oct 2023
Multi-Modal Knowledge Graph Transformer Framework for Multi-Modal Entity Alignment
Qian Li
Cheng Ji
Shu Guo
Zhaoji Liang
Lihong Wang
Jianxin Li
68
11
0
10 Oct 2023
Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering
Xiulong Liu
Zhikang Dong
Peng Zhang
76
24
0
10 Oct 2023
ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
KAI-QING Zhou
Kwonjoon Lee
Teruhisa Misu
Xin Eric Wang
LRM
102
4
0
09 Oct 2023
Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
ReLM
LRM
76
8
0
09 Oct 2023
Foundation Models Meet Visualizations: Challenges and Opportunities
Weikai Yang
Mengchen Liu
Zheng Wang
Shixia Liu
87
40
0
09 Oct 2023
Causal Reasoning through Two Layers of Cognition for Improving Generalization in Visual Question Answering
Trang Nguyen
Naoaki Okazaki
LRM
82
0
0
09 Oct 2023
Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models
Holy Lovenia
Wenliang Dai
Samuel Cahyawijaya
Ziwei Ji
Pascale Fung
MLLM
107
53
0
09 Oct 2023
Lightweight In-Context Tuning for Multimodal Unified Models
Yixin Chen
Shuai Zhang
Boran Han
Jiaya Jia
65
2
0
08 Oct 2023
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks
Avinash Madasu
Anahita Bhiwandiwalla
Vasudev Lal
VLM
74
0
0
07 Oct 2023
Exploring the Potential of Multi-Modal AI for Driving Hazard Prediction
Korawat Charoenpitaks
Van-Quang Nguyen
Masanori Suganuma
Masahiro Takahashi
Ryoma Niihara
Takayuki Okatani
104
1
0
07 Oct 2023
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
Ziyi Yin
Muchao Ye
Tianrong Zhang
Tianyu Du
Jinguo Zhu
Han Liu
Jinghui Chen
Ting Wang
Fenglong Ma
AAML
VLM
CoGe
89
44
0
07 Oct 2023
Improving Automatic VQA Evaluation Using Large Language Models
Oscar Manas
Benno Krojer
Aishwarya Agrawal
93
25
0
04 Oct 2023
On the Cognition of Visual Question Answering Models and Human Intelligence: A Comparative Study
Liben Chen
Long Chen
Tian Ellison-Chen
Zhuoyuan Xu
LRM
36
0
0
04 Oct 2023
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu
Hritik Bansal
Tony Xia
Jiacheng Liu
Chun-yue Li
Hannaneh Hajishirzi
Hao Cheng
Kai-Wei Chang
Michel Galley
Jianfeng Gao
LRM
MLLM
167
669
0
03 Oct 2023
Constructing Image-Text Pair Dataset from Books
Yamato Okamoto
Haruto Toyonaga
Yoshihisa Ijiri
Hirokatsu Kataoka
79
3
0
03 Oct 2023
HallE-Control: Controlling Object Hallucination in Large Multimodal Models
Bohan Zhai
Shijia Yang
Chenfeng Xu
Sheng Shen
Kurt Keutzer
Chunyuan Li
Manling Li
MLLM
105
14
0
03 Oct 2023
Human Mobility Question Answering (Vision Paper)
Hao Xue
Flora D. Salim
44
0
0
02 Oct 2023
Application of frozen large-scale models to multimodal task-oriented dialogue
Tatsuki Kawamoto
Takuma Suzuki
Ko Miyama
Takumi Meguro
Tomohiro Takagi
61
1
0
02 Oct 2023
CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets
Lifan Yuan
Yangyi Chen
Xingyao Wang
Yi R. Fung
Hao Peng
Heng Ji
LLMAG
KELM
128
69
0
29 Sep 2023
Towards Complex-query Referring Image Segmentation: A Novel Benchmark
Wei Ji
Li Li
Marco Pleines
Xiangyan Liu
Xu Yang
Juncheng Billy Li
Roger Zimmermann
68
8
0
29 Sep 2023
Previous
1
2
3
...
16
17
18
...
58
59
60
Next