ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.10638
  4. Cited By
Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly

15 June 2024
Yexin Liu
Zhengyang Liang
Yueze Wang
Muyang He
Jian Li
Bo-Lu Zhao
Jian Li
Zheng Liu
Harry Yang
Sernam Lim
Bo Zhao
ArXivPDFHTML

Papers citing "Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly"

14 / 14 papers shown
Title
Efficient Multimodal Large Language Models: A Survey
Efficient Multimodal Large Language Models: A Survey
Yizhang Jin
Jian Li
Yexin Liu
Tianjun Gu
Kai Wu
...
Xin Tan
Zhenye Gan
Yabiao Wang
Chengjie Wang
Lizhuang Ma
LRM
47
45
0
17 May 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
95
139
0
29 Apr 2024
ALOHa: A New Measure for Hallucination in Captioning Models
ALOHa: A New Measure for Hallucination in Captioning Models
Suzanne Petryk
David M. Chan
Anish Kachinthaya
Haodi Zou
John F. Canny
Joseph E. Gonzalez
Trevor Darrell
HILM
39
11
0
03 Apr 2024
IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language
  Models
IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models
Haz Sameen Shahgir
Khondker Salman Sayeed
Abhik Bhattacharjee
Wasi Uddin Ahmad
Yue Dong
Rifat Shahriyar
VLM
MLLM
42
10
0
23 Mar 2024
PhD: A ChatGPT-Prompted Visual hallucination Evaluation Dataset
PhD: A ChatGPT-Prompted Visual hallucination Evaluation Dataset
Jiazhen Liu
Yuhan Fu
Ruobing Xie
Runquan Xie
Xingwu Sun
Fengzong Lian
Zhanhui Kang
Xirong Li
MLLM
38
10
0
17 Mar 2024
Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small
  Language Models
Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models
Minjie Zhu
Yichen Zhu
Xin Liu
Ning Liu
Zhiyuan Xu
Chaomin Shen
Yaxin Peng
Zhicai Ou
Feifei Feng
Jian Tang
VLM
57
20
0
10 Mar 2024
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation
  Framework for Large Vision Language Models
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models
Chaoya Jiang
Wei Ye
Mengfan Dong
Hongrui Jia
Haiyang Xu
Mingshi Yan
Ji Zhang
Shikun Zhang
VLM
MLLM
43
15
0
24 Feb 2024
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned
  Language Models
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
Siddharth Karamcheti
Suraj Nair
Ashwin Balakrishna
Percy Liang
Thomas Kollar
Dorsa Sadigh
MLLM
VLM
57
98
0
12 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
130
109
0
08 Feb 2024
The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs
The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs
Tianyang Han
Qing Lian
Rui Pan
Renjie Pi
Jipeng Zhang
Shizhe Diao
Yong Lin
Tong Zhang
75
1
0
06 Feb 2024
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with
  Modality Collaboration
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
126
375
0
07 Nov 2023
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
160
440
0
14 Oct 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
278
4,244
0
30 Jan 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
211
1,106
0
20 Sep 2022
1