ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.09788
  4. Cited By
Reasoning Paths with Reference Objects Elicit Quantitative Spatial
  Reasoning in Large Vision-Language Models

Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models

15 September 2024
Yuan-Hong Liao
Rafid Mahmood
Sanja Fidler
David Acuna
    ReLM
    LRM
ArXivPDFHTML

Papers citing "Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models"

8 / 8 papers shown
Title
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
Yifu Yuan
Haiqin Cui
Yibin Chen
Zibin Dong
Fei Ni
Longxin Kou
Jinyi Liu
Pengyi Li
Yan Zheng
Jianye Hao
86
0
0
13 May 2025
Task-oriented Robotic Manipulation with Vision Language Models
Task-oriented Robotic Manipulation with Vision Language Models
Nurhan Bulus Guran
Hanchi Ren
Jingjing Deng
Xianghua Xie
75
4
0
21 Oct 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
105
158
0
16 Jul 2024
Grounding Predicates through Actions
Grounding Predicates through Actions
Toki Migimatsu
Jeannette Bohg
174
35
0
29 Sep 2021
SORNet: Spatial Object-Centric Representations for Sequential
  Manipulation
SORNet: Spatial Object-Centric Representations for Sequential Manipulation
Wentao Yuan
Chris Paxton
Karthik Desingh
Dieter Fox
3DPC
167
72
0
08 Sep 2021
A Corpus for Reasoning About Natural Language Grounded in Photographs
A Corpus for Reasoning About Natural Language Grounded in Photographs
Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
LRM
92
601
0
01 Nov 2018
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
Angela Dai
Angel X. Chang
Manolis Savva
Maciej Halber
Thomas Funkhouser
Matthias Nießner
3DPC
3DV
353
4,039
0
14 Feb 2017
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary
  Visual Reasoning
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
283
2,365
0
20 Dec 2016
1