ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.19808
40
0

LocateBench: Evaluating the Locating Ability of Vision Language Models

17 October 2024
Ting-Rui Chiang
Joshua Robinson
Xinyan Velocity Yu
Dani Yogatama
    VLM
    ELM
ArXivPDFHTML
Abstract

The ability to locate an object in an image according to natural language instructions is crucial for many real-world applications. In this work we propose LocateBench, a high-quality benchmark dedicated to evaluating this ability. We experiment with multiple prompting approaches, and measure the accuracy of several large vision language models. We find that even the accuracy of the strongest model, GPT-4o, lags behind human accuracy by more than 10%.

View on arXiv
Comments on this paper