ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.06166
  4. Cited By
Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times
  and Location Reasoning

Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning

12 July 2023
Gengyuan Zhang
Yurui Zhang
Kerui Zhang
Volker Tresp
    LRM
ArXivPDFHTML

Papers citing "Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning"

10 / 10 papers shown
Title
Multi-modal Summarization in Model-Based Engineering: Automotive Software Development Case Study
Nenad Petrovic
Yurui Zhang
Moaad Maaroufi
Kuo-Yi Chao
Lukasz Mazur
F. Pan
Vahid Zolfaghari
Alois C. Knoll
62
0
0
06 Mar 2025
NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization
NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization
Zheyuan Zhang
Runze Li
Tasnim Kabir
Jordan Boyd-Graber
55
0
0
21 Feb 2025
PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Hammad A. Ayyubi
Xuande Feng
Junzhang Liu
Xudong Lin
Zhecan Wang
Shih-Fu Chang
45
0
0
24 Jan 2025
Beyond Captioning: Task-Specific Prompting for Improved VLM Performance
  in Mathematical Reasoning
Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning
Ayush Singh
Mansi Gupta
Shivank Garg
Abhinav Kumar
Vansh Agrawal
ReLM
LRM
26
0
0
08 Oct 2024
Constructive Apraxia: An Unexpected Limit of Instructible
  Vision-Language Models and Analog for Human Cognitive Disorders
Constructive Apraxia: An Unexpected Limit of Instructible Vision-Language Models and Analog for Human Cognitive Disorders
David A. Noever
S. M. Noever
25
0
0
17 Sep 2024
GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via
  VLM
GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM
Keshav Bimbraw
Ye Wang
Jing Liu
T. Koike-Akino
VLM
MedIm
LM&MA
32
1
0
15 Jul 2024
Losing Visual Needles in Image Haystacks: Vision Language Models are
  Easily Distracted in Short and Long Contexts
Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts
Aditya Sharma
Michael Saxon
William Yang Wang
VLM
34
2
0
24 Jun 2024
Automatic benchmarking of large multimodal models via iterative
  experiment programming
Automatic benchmarking of large multimodal models via iterative experiment programming
Alessandro Conti
Enrico Fini
Paolo Rota
Yiming Wang
Massimiliano Mancini
Elisa Ricci
40
0
0
18 Jun 2024
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
392
4,125
0
28 Jan 2022
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,194
0
01 Sep 2014
1