Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.17820
Cited By
Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models
28 May 2024
Sangmin Woo
Donguk Kim
Jaehyuk Jang
Yubin Choi
Changick Kim
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models"
47 / 47 papers shown
Title
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Di Zhang
Jingdi Lei
Junxian Li
Xunzhi Wang
Yong Liu
...
Steve Yang
Jianbo Wu
Peng Ye
Wanli Ouyang
Dongzhan Zhou
OffRL
LRM
129
7
0
27 Nov 2024
Law of Vision Representation in MLLMs
Shijia Yang
Bohan Zhai
Quanzeng You
Jianbo Yuan
Hongxia Yang
Chenfeng Xu
60
10
0
29 Aug 2024
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo
Wenchao Xu
Zhong Zhang
Yining Qi
Zhicheng Chen
Peilin Zhao
VLM
MLLM
105
25
0
04 Aug 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
...
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DV
MLLM
72
321
0
24 Jun 2024
Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding
Xintong Wang
Jingheng Pan
Liang Ding
Christian Biemann
MLLM
45
65
0
27 Mar 2024
Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination
Dingchen Yang
Bowen Cao
Guang Chen
Changjun Jiang
61
8
0
21 Mar 2024
Multi-Modal Hallucination Control by Visual Information Grounding
Alessandro Favero
Luca Zancato
Matthew Trager
Siddharth Choudhary
Pramuditha Perera
Alessandro Achille
Ashwin Swaminathan
Stefano Soatto
MLLM
100
73
0
20 Mar 2024
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
David Wan
Jaemin Cho
Elias Stengel-Eskin
Mohit Bansal
VLM
ObjD
66
32
0
04 Mar 2024
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
Zhaorun Chen
Zhuokai Zhao
Hongyin Luo
Huaxiu Yao
Bo Li
Jiawei Zhou
MLLM
65
64
0
01 Mar 2024
Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models
Jun Wu
Qiang Liu
Ding Wang
Jinghao Zhang
Shu Wu
Liang Wang
Tien-Ping Tan
LRM
50
22
0
18 Feb 2024
Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance
Linxi Zhao
Yihe Deng
Weitong Zhang
Quanquan Gu
MLLM
60
30
0
13 Feb 2024
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong
Zhuang Liu
Yuexiang Zhai
Yi-An Ma
Yann LeCun
Saining Xie
VLM
MLLM
67
302
0
11 Jan 2024
Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models
Xin He
Longhui Wei
Lingxi Xie
Qi Tian
65
8
0
06 Jan 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
194
1,036
0
21 Dec 2023
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Jitesh Jain
Jianwei Yang
Humphrey Shi
MLLM
33
28
0
21 Dec 2023
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Chaoya Jiang
Haiyang Xu
Mengfan Dong
Jiaxing Chen
Wei Ye
Mingshi Yan
Qinghao Ye
Ji Zhang
Fei Huang
Shikun Zhang
VLM
24
55
0
12 Dec 2023
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
M. Steyvers
Yuan Yao
Haoye Zhang
Taiwen He
Yifeng Han
...
Xinyue Hu
Zhiyuan Liu
Hai-Tao Zheng
Maosong Sun
Tat-Seng Chua
MLLM
VLM
157
198
0
01 Dec 2023
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Qidong Huang
Xiao-wen Dong
Pan Zhang
Bin Wang
Conghui He
Jiaqi Wang
Dahua Lin
Weiming Zhang
Neng H. Yu
MLLM
98
189
0
29 Nov 2023
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Sicong Leng
Hang Zhang
Guanzheng Chen
Xin Li
Shijian Lu
Chunyan Miao
Li Bing
VLM
MLLM
117
217
0
28 Nov 2023
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
Zhiyuan Zhao
Bin Wang
Linke Ouyang
Xiao-wen Dong
Jiaqi Wang
Conghui He
MLLM
VLM
80
114
0
28 Nov 2023
Mitigating Hallucination in Visual Language Models with Visual Supervision
Zhiyang Chen
Yousong Zhu
Yufei Zhan
Zhaowen Li
Chaoyang Zhao
Jinqiao Wang
Ming Tang
VLM
MLLM
65
28
0
27 Nov 2023
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
Qifan Yu
Juncheng Li
Longhui Wei
Liang Pang
Wentao Ye
Bosheng Qin
Siliang Tang
Qi Tian
Yueting Zhuang
MLLM
VLM
76
71
0
22 Nov 2023
AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation
Junyang Wang
Yuhang Wang
Guohai Xu
Jing Zhang
Yukai Gu
...
Jiaqi Wang
Haiyang Xu
Ming Yan
Ji Zhang
Jitao Sang
MLLM
VLM
32
112
0
13 Nov 2023
Volcano: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision
Seongyun Lee
Sue Hyun Park
Yongrae Jo
Minjoon Seo
36
57
0
13 Nov 2023
Improved Baselines with Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLM
MLLM
79
2,593
0
05 Oct 2023
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
Yiyang Zhou
Chenhang Cui
Jaehong Yoon
Linjun Zhang
Zhun Deng
Chelsea Finn
Mohit Bansal
Huaxiu Yao
MLLM
93
174
0
01 Oct 2023
Aligning Large Multimodal Models with Factually Augmented RLHF
Zhiqing Sun
Sheng Shen
Shengcao Cao
Haotian Liu
Chunyuan Li
...
Liangyan Gui
Yu-Xiong Wang
Yiming Yang
Kurt Keutzer
Trevor Darrell
VLM
77
351
0
25 Sep 2023
Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models
Jiaying Lu
Jinmeng Rao
Kezhen Chen
Xiaoyuan Guo
Yawen Zhang
Baochen Sun
Carl Yang
Jie Yang
37
11
0
07 Sep 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLM
VLM
ObjD
65
851
0
24 Aug 2023
VIGC: Visual Instruction Generation and Correction
Bin Wang
Fan Wu
Xiao Han
Jiahui Peng
Huaping Zhong
...
Xiao-wen Dong
Weijia Li
Wei Li
Jiaqi Wang
Conghui He
MLLM
57
66
0
24 Aug 2023
Detecting and Preventing Hallucinations in Large Vision Language Models
Anisha Gunjal
Jihan Yin
Erhan Bas
MLLM
VLM
43
159
0
11 Aug 2023
MMBench: Is Your Multi-modal Model an All-around Player?
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
...
Jiaqi Wang
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
38
961
0
12 Jul 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Ke Chen
Zhao Zhang
Weili Zeng
Richong Zhang
Feng Zhu
Rui Zhao
ObjD
56
622
0
27 Jun 2023
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Fuxiao Liu
Kevin Qinghong Lin
Linjie Li
Jianfeng Wang
Yaser Yacoob
Lijuan Wang
VLM
MLLM
78
260
0
26 Jun 2023
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu
Peixian Chen
Yunhang Shen
Yulei Qin
Mengdan Zhang
...
Xiawu Zheng
Ke Li
Xing Sun
Zhenyu Qiu
Rongrong Ji
ELM
MLLM
53
806
0
23 Jun 2023
Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLM
LRM
208
748
0
17 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLM
VLM
62
1,977
0
11 May 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLM
MLLM
92
1,978
0
20 Apr 2023
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
329
4,506
0
17 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
374
4,406
0
30 Jan 2023
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Dustin Schwenk
Apoorv Khandelwal
Christopher Clark
Kenneth Marino
Roozbeh Mottaghi
39
524
0
03 Jun 2022
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
542
5,920
0
29 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
605
28,659
0
26 Feb 2021
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
249
12,847
0
26 May 2020
Object Hallucination in Image Captioning
Anna Rohrbach
Lisa Anne Hendricks
Kaylee Burns
Trevor Darrell
Kate Saenko
103
418
0
06 Sep 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
360
129,831
0
12 Jun 2017
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
222
43,290
0
01 May 2014
1