Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.16851
Cited By
Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts
24 June 2024
Aditya Sharma
Michael Saxon
William Yang Wang
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts"
6 / 6 papers shown
Title
Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition
Sergio Romero-Tapiador
Ruben Tolosana
Blanca Lacruz-Pleguezuelos
L. Marcos-Zambrano
Guadalupe X.Bazán
Isabel Espinosa-Salinas
Julian Fierrez
Javier-Ortega Garcia
Enrique Carrillo-de Santa Pau
Aythami Morales
CoGe
29
0
0
09 Apr 2025
MileBench: Benchmarking MLLMs in Long Context
Dingjie Song
Shunian Chen
Guiming Hardy Chen
Fei Yu
Xiang Wan
Benyou Wang
VLM
78
34
0
29 Apr 2024
Multi-Modal Hallucination Control by Visual Information Grounding
Alessandro Favero
L. Zancato
Matthew Trager
Siddharth Choudhary
Pramuditha Perera
Alessandro Achille
Ashwin Swaminathan
Stefano Soatto
MLLM
87
63
0
20 Mar 2024
Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
Michael Stephen Saxon
Yiran Luo
Sharon Levy
Chitta Baral
Yezhou Yang
William Y. Wang
EGVM
33
3
0
17 Mar 2024
CogAgent: A Visual Language Model for GUI Agents
Wenyi Hong
Weihan Wang
Qingsong Lv
Jiazheng Xu
Wenmeng Yu
...
Juanzi Li
Bin Xu
Yuxiao Dong
Ming Ding
Jie Tang
MLLM
142
325
0
14 Dec 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
211
1,113
0
20 Sep 2022
1