Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.10355
Cited By
Evaluating Object Hallucination in Large Vision-Language Models
17 May 2023
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluating Object Hallucination in Large Vision-Language Models"
50 / 585 papers shown
Title
PAINT: Paying Attention to INformed Tokens to Mitigate Hallucination in Large Vision-Language Model
Kazi Hasan Ibn Arif
Sajib Acharjee Dip
Khizar Hussain
Lang Zhang
Chris Thomas
71
0
0
21 Jan 2025
Know "No'' Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
J. Park
Jungbeom Lee
Jongyoon Song
Sangwon Yu
Dahuin Jung
Sungroh Yoon
45
0
0
19 Jan 2025
Playing Devil's Advocate: Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models
Abdulkadir Erol
Trilok Padhi
Agnik Saha
Ugur Kursuncu
Mehmet Emin Aktas
47
1
0
17 Jan 2025
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo
Min-Hung Chen
De-An Huang
Sifei Liu
Subhashree Radhakrishnan
Seon Joo Kim
Yu-Chun Wang
Ryo Hachiuma
ObjD
VLM
159
2
0
14 Jan 2025
LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models
Mozhgan Nasr Azadani
James Riddell
Sean Sedwards
Krzysztof Czarnecki
MLLM
VLM
47
2
0
13 Jan 2025
Feedback-Driven Vision-Language Alignment with Minimal Human Supervision
Giorgio Giannone
Ruoteng Li
Qianli Feng
Evgeny Perevodchikov
Rui Chen
Aleix M. Martinez
VLM
66
0
0
08 Jan 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Haobo Yuan
Xianrui Li
Tao Zhang
Zilong Huang
Shilin Xu
S. Ji
Yunhai Tong
Lu Qi
Jiashi Feng
Ming Yang
VLM
96
11
0
07 Jan 2025
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Shaolei Zhang
Qingkai Fang
Zhe Yang
Yang Feng
MLLM
VLM
69
25
0
07 Jan 2025
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
Haicheng Wang
Zhemeng Yu
Gabriele Spadaro
Chen Ju
Victor Quétu
Enzo Tartaglione
Enzo Tartaglione
VLM
109
3
0
05 Jan 2025
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning
Chun-Yi Kuan
Hung-yi Lee
AuLLM
LRM
72
1
0
03 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
99
48
0
03 Jan 2025
Multimodal Preference Data Synthetic Alignment with Reward Model
Robert Wijaya
Ngoc-Bao Nguyen
Ngai-man Cheung
MLLM
SyDa
62
2
0
23 Dec 2024
CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models
Yeyuan Wang
D. Gao
Bin Li
Rujiao Long
Lei Yi
Xiaoyan Cai
Libin Yang
Jinxia Zhang
Shanqing Yu
Qi Xuan
78
1
0
22 Dec 2024
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
104
2
0
20 Dec 2024
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
Le Yang
Ziwei Zheng
Boxu Chen
Zhengyu Zhao
Chenhao Lin
Chao Shen
VLM
140
3
0
18 Dec 2024
NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries
Tao Wu
Chuhao Zhou
Yen Heng Wong
Lin Gu
Jianfei Yang
89
1
0
14 Dec 2024
Olympus: A Universal Task Router for Computer Vision Tasks
Yuanze Lin
Yunsheng Li
Dongdong Chen
Weijian Xu
Ronald Clark
Philip H. S. Torr
VLM
ObjD
197
0
0
12 Dec 2024
LLaVA-Zip: Adaptive Visual Token Compression with Intrinsic Image Information
Ke Wang
Hong Xuan
VLM
67
2
0
11 Dec 2024
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Jiuhai Chen
Jianwei Yang
Haiping Wu
Dianqi Li
Jianfeng Gao
Tianyi Zhou
Bin Xiao
VLM
60
4
0
05 Dec 2024
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
Yiwu Zhong
Zhuoming Liu
Yin Li
Liwei Wang
82
2
0
04 Dec 2024
Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis
Po-Hsuan Huang
Jeng-Lin Li
Chin-Po Chen
Ming-Ching Chang
Wei-Chao Chen
LRM
76
1
0
04 Dec 2024
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Kaixiong Gong
Kaituo Feng
Yangqiu Song
Yibing Wang
Mofan Cheng
...
Jiaming Han
Benyou Wang
Yutong Bai
Z. Yang
Xiangyu Yue
MLLM
AuLLM
VLM
89
5
0
03 Dec 2024
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
Xueqing Wu
Yuheng Ding
Bingxuan Li
Pan Lu
Da Yin
Kai-Wei Chang
Nanyun Peng
LRM
105
3
0
03 Dec 2024
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yu-Chiang Frank Wang
Y. Ro
Yueh-Hua Wu
VLM
81
0
0
02 Dec 2024
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
Qizhe Zhang
Aosong Cheng
Ming Lu
Zhiyong Zhuo
Minqi Wang
Jiajun Cao
Shaobo Guo
Qi She
Shanghang Zhang
VLM
90
11
0
02 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
85
2
0
02 Dec 2024
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
Xubing Ye
Yukang Gan
Yixiao Ge
Xiao Zhang
Yansong Tang
101
7
0
30 Nov 2024
Is Oracle Pruning the True Oracle?
Sicheng Feng
Keda Tao
Haoyu Wang
VLM
70
0
0
28 Nov 2024
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads
Siqi Kou
Jiachun Jin
Chang Liu
Ye Ma
Jian Jia
Quan Chen
Peng Jiang
Zhijie Deng
Zhijie Deng
DiffM
VGen
VLM
135
6
0
28 Nov 2024
FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models
Alice Heiman
Xiaoman Zhang
E. Chen
Sung Eun Kim
Pranav Rajpurkar
HILM
MedIm
82
0
0
27 Nov 2024
Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models
Jiaheng Liu
Yumeng Li
Boyuan Xiao
Yichang Jian
Ziang Qin
Tianjia Shao
Yao-Xiang Ding
Kun Zhou
MLLM
LRM
100
3
0
27 Nov 2024
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Qing Jiang
Gen Luo
Yuqin Yang
Yuda Xiong
Yihao Chen
Zhaoyang Zeng
Tianhe Ren
Lei Zhang
VLM
LRM
109
6
0
27 Nov 2024
Evaluating Vision-Language Models as Evaluators in Path Planning
Mohamed Aghzal
Xiang Yue
Erion Plaku
Ziyu Yao
LRM
77
1
0
27 Nov 2024
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Di Zhang
Jingdi Lei
Junxian Li
Xunzhi Wang
Y. Liu
...
Steve Yang
Jianbo Wu
Peng Ye
Wanli Ouyang
Dongzhan Zhou
OffRL
LRM
107
6
0
27 Nov 2024
NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects?
Jiaxuan Li
Junwen Mo
MinhDuc Vo
Akihiro Sugimoto
Hideki Nakayama
87
0
0
26 Nov 2024
A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs
Lehan He
Zeren Chen
Zhelun Shi
Tianyu Yu
Jing Shao
Lu Sheng
MLLM
111
1
0
26 Nov 2024
Efficient Multi-modal Large Language Models via Visual Token Grouping
Minbin Huang
Runhui Huang
Han Shi
Yimeng Chen
Chuanyang Zheng
Xiangguo Sun
Xin Jiang
Z. Li
Hong Cheng
VLM
90
3
0
26 Nov 2024
Exploring Aleatoric Uncertainty in Object Detection via Vision Foundation Models
Peng Cui
Guande He
Dan Zhang
Zhijie Deng
Yinpeng Dong
Jun Zhu
84
1
0
26 Nov 2024
Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
Hao Yi
Qingyang Li
Yihan Hu
Fuzheng Zhang
Di Zhang
Yong Liu
VGen
71
0
0
25 Nov 2024
Are Transformers Truly Foundational for Robotics?
James A. R. Marshall
Andrew B. Barron
AI4CE
73
0
0
25 Nov 2024
Is 'Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning
Ji Hyeok Jung
Eun Tae Kim
S. Kim
Joo Ho Lee
Bumsoo Kim
Buru Chang
VLM
186
0
0
24 Nov 2024
VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding
Jiaqi Wang
Yifei Gao
Jitao Sang
MLLM
121
2
0
24 Nov 2024
freePruner: A Training-free Approach for Large Multimodal Model Acceleration
Bingxin Xu
Yuzhang Shang
Yunhao Ge
Qian Lou
Yan Yan
97
3
0
23 Nov 2024
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
Junzhe Chen
Tianshu Zhang
S. Huang
Yuwei Niu
Linfeng Zhang
Lijie Wen
Xuming Hu
MLLM
VLM
180
2
0
22 Nov 2024
FocusLLaVA: A Coarse-to-Fine Approach for Efficient and Effective Visual Token Compression
Yuke Zhu
Chi Xie
Shuang Liang
Bo Zheng
Sheng Guo
80
8
0
21 Nov 2024
Panther: Illuminate the Sight of Multimodal LLMs with Instruction-Guided Visual Prompts
Honglin Li
Yuting Gao
Chenglu Zhu
Jingdong Chen
M. Yang
Lin Yang
MLLM
89
0
0
21 Nov 2024
Teaching VLMs to Localize Specific Objects from In-context Examples
Sivan Doveh
Nimrod Shabtay
Wei Lin
Eli Schwartz
Hilde Kuehne
...
Leonid Karlinsky
James Glass
Assaf Arbelle
S. Ullman
Muhammad Jehanzeb Mirza
VLM
103
1
0
20 Nov 2024
MC-LLaVA: Multi-Concept Personalized Vision-Language Model
Ruichuan An
Sihan Yang
Ming Lu
Kai Zeng
Yulin Luo
...
Hao Liang
Qi She
Shanghang Zhang
W. Zhang
Wentao Zhang
90
5
0
18 Nov 2024
Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering
Zeping Yu
Sophia Ananiadou
136
0
0
17 Nov 2024
Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination
Haojie Zheng
Tianyang Xu
Hanchi Sun
Shu Pu
Ruoxi Chen
Lichao Sun
MLLM
LRM
84
8
0
15 Nov 2024
Previous
1
2
3
4
5
...
10
11
12
Next