ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.16851
  4. Cited By
Losing Visual Needles in Image Haystacks: Vision Language Models are
  Easily Distracted in Short and Long Contexts

Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts

24 June 2024
Aditya Sharma
Michael Saxon
William Yang Wang
    VLM
ArXivPDFHTML

Papers citing "Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts"

6 / 6 papers shown
Title
Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition
Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition
Sergio Romero-Tapiador
Ruben Tolosana
Blanca Lacruz-Pleguezuelos
L. Marcos-Zambrano
Guadalupe X.Bazán
Isabel Espinosa-Salinas
Julian Fierrez
Javier-Ortega Garcia
Enrique Carrillo-de Santa Pau
Aythami Morales
CoGe
29
0
0
09 Apr 2025
MileBench: Benchmarking MLLMs in Long Context
MileBench: Benchmarking MLLMs in Long Context
Dingjie Song
Shunian Chen
Guiming Hardy Chen
Fei Yu
Xiang Wan
Benyou Wang
VLM
78
34
0
29 Apr 2024
Multi-Modal Hallucination Control by Visual Information Grounding
Multi-Modal Hallucination Control by Visual Information Grounding
Alessandro Favero
L. Zancato
Matthew Trager
Siddharth Choudhary
Pramuditha Perera
Alessandro Achille
Ashwin Swaminathan
Stefano Soatto
MLLM
87
63
0
20 Mar 2024
Lost in Translation? Translation Errors and Challenges for Fair
  Assessment of Text-to-Image Models on Multilingual Concepts
Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
Michael Stephen Saxon
Yiran Luo
Sharon Levy
Chitta Baral
Yezhou Yang
William Y. Wang
EGVM
33
3
0
17 Mar 2024
CogAgent: A Visual Language Model for GUI Agents
CogAgent: A Visual Language Model for GUI Agents
Wenyi Hong
Weihan Wang
Qingsong Lv
Jiazheng Xu
Wenmeng Yu
...
Juanzi Li
Bin Xu
Yuxiao Dong
Ming Ding
Jie Tang
MLLM
142
325
0
14 Dec 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
211
1,113
0
20 Sep 2022
1