ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.00598
  4. Cited By
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
v1v2 (latest)

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

1 April 2022
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
Stefan Welker
F. Tombari
Aveek Purohit
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
    ReLMLRM
ArXiv (abs)PDFHTML

Papers citing "Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language"

50 / 438 papers shown
Title
MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning
MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning
Xiaoyuan Li
Moxin Li
Wenjie Wang
Rui Men
Yichang Zhang
Fuli Feng
Dayiheng Liu
Junyang Lin
LRM
5
0
0
24 Jul 2025
AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making
AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making
Wenbo Li
Shiyi Wang
Yiteng Chen
Huiping Zhuang
Qingyao Wu
45
0
0
14 Jun 2025
A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
Benno Krojer
Mojtaba Komeili
Candace Ross
Q. Garrido
Koustuv Sinha
Nicolas Ballas
Mahmoud Assran
83
1
0
11 Jun 2025
Fast ECoT: Efficient Embodied Chain-of-Thought via Thoughts Reuse
Fast ECoT: Efficient Embodied Chain-of-Thought via Thoughts Reuse
Zhekai Duan
Yuan Zhang
Shikai Geng
Gaowen Liu
Joschka Boedecker
Chris Xiaoxuan Lu
LRM
36
0
0
09 Jun 2025
WorldPrediction: A Benchmark for High-level World Modeling and Long-horizon Procedural Planning
Delong Chen
Willy Chung
Yejin Bang
Ziwei Ji
Pascale Fung
VGenLM&Ro
83
0
0
04 Jun 2025
Aligning Dialogue Agents with Global Feedback via Large Language Model Reward Decomposition
Aligning Dialogue Agents with Global Feedback via Large Language Model Reward Decomposition
Dong Won Lee
Hae Won Park
C. Breazeal
Louis-Philippe Morency
62
0
0
21 May 2025
Understanding Complexity in VideoQA via Visual Program Generation
Understanding Complexity in VideoQA via Visual Program Generation
Cristobal Eyzaguirre
Igor Vasiljevic
Achal Dave
Jiajun Wu
Rares Andrei Ambrus
Thomas Kollar
Juan Carlos Niebles
P. Tokmakov
80
0
0
19 May 2025
Embodied AI in Machine Learning -- is it Really Embodied?
Embodied AI in Machine Learning -- is it Really Embodied?
Matej Hoffmann
Shubhan Patni
LM&RoAI4CE
99
0
0
15 May 2025
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
Linus Nwankwo
Bjoern Ellensohn
Ozan Özdenizci
Elmar Rueckert
LM&Ro
256
0
0
03 May 2025
Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models
Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models
Chen Wang
Fei Xia
Wenhao Yu
Tingnan Zhang
Ruohan Zhang
Ce Liu
Li Fei-Fei
Jie Tan
Jacky Liang
90
1
0
17 Apr 2025
How Can Objects Help Video-Language Understanding?
How Can Objects Help Video-Language Understanding?
Zitian Tang
Shijie Wang
Junho Cho
Jaewook Yoo
Chen Sun
125
1
0
10 Apr 2025
Debate-Feedback: A Multi-Agent Framework for Efficient Legal Judgment Prediction
Debate-Feedback: A Multi-Agent Framework for Efficient Legal Judgment Prediction
Xi Chen
Mao Mao
Shuo Li
Haotian Shangguan
LLMAGAILawELM
150
1
0
07 Apr 2025
Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation
Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation
Jiaming Chen
Wentao Zhao
Ziyu Meng
Donghui Mao
Ran Song
Wei Pan
Wei Zhang
142
0
0
07 Apr 2025
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
108
0
0
31 Mar 2025
Cooking Task Planning using LLM and Verified by Graph Network
Cooking Task Planning using LLM and Verified by Graph Network
Ryunosuke Takebayashi
V. H. Isume
Takuya Kiyokawa
Weiwei Wan
Kensuke Harada
101
0
0
27 Mar 2025
FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs
FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs
Carlos Plou
Cesar Borja
Ruben Martinez-Cantin
Ana C. Murillo
114
0
0
25 Mar 2025
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
Nina Shvetsova
Arsha Nagrani
Bernt Schiele
Hilde Kuehne
Christian Rupprecht
104
1
0
24 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRLLRMAI4CE
132
1
0
22 Mar 2025
EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks
Yi Zhang
Qiang Zhang
Xiaozhu Ju
Ziqiang Liu
Jilei Mao
...
Jiaxu Wang
Yiqun Duan
Jiahang Cao
Renjing Xu
Jian Tang
LM&RoLRM
123
0
0
14 Mar 2025
Towards Fast, Memory-based and Data-Efficient Vision-Language Policy
Haoxuan Li
Sixu Yan
Yongqian Li
Xinggang Wang
LM&Ro
128
1
0
13 Mar 2025
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization
Zongshang Pang
Mayu Otani
Yuta Nakashima
140
0
0
12 Mar 2025
Generating Robot Constitutions & Benchmarks for Semantic Safety
P. Sermanet
Anirudha Majumdar
A. Irpan
Dmitry Kalashnikov
Vikas Sindhwani
LM&Ro
172
3
0
11 Mar 2025
Investigating the Effectiveness of a Socratic Chain-of-Thoughts Reasoning Method for Task Planning in Robotics, A Case Study
Veronica Bot
Zheyuan Xu
LRMLLMAGLM&Ro
182
0
0
11 Mar 2025
LTLCodeGen: Code Generation of Syntactically Correct Temporal Logic for Robot Task Planning
Behrad Rabiei
Mahesh Kumar A.R.
Zhirui Dai
Surya L.S.R. Pilla
Qiyue Dong
Nikolay Atanasov
LM&Ro
96
0
0
10 Mar 2025
Alignment for Efficient Tool Calling of Large Language Models
Hongshen Xu
Zihan Wang
Zichen Zhu
Lei Pan
Xingyu Chen
Lu Chen
Kai Yu
102
1
0
09 Mar 2025
CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments
Mingcong Lei
Ge Wang
Yiming Zhao
Zhixin Mai
Qing Zhao
Yao Guo
Zhen Li
Shuguang Cui
Yatong Han
J. Ren
LLMAG
109
0
0
02 Mar 2025
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Shalev Lifshitz
Sheila A. McIlraith
Yilun Du
LRM
138
8
0
27 Feb 2025
Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices
Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices
Xinru Wang
Mengjie Yu
Hannah Nguyen
Michael Iuzzolino
Tianyi Wang
...
Ting Zhang
Naveen Sendhilnathan
Hrvoje Benko
Haijun Xia
Tanya R. Jonker
97
0
0
26 Feb 2025
Beyond Pattern Recognition: Probing Mental Representations of LMs
Beyond Pattern Recognition: Probing Mental Representations of LMs
Moritz Miller
Kumar Shridhar
ReLMLRM
120
0
0
23 Feb 2025
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
Ming Shan Hee
Roy Ka-wei Lee
VLM
118
1
0
16 Feb 2025
A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards
A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards
Shivansh Patel
Xinchen Yin
Wenlong Huang
Shubham Garg
H. Nayyeri
Li Fei-Fei
Svetlana Lazebnik
Yongqian Li
187
1
0
12 Feb 2025
Robust Mobile Robot Path Planning via LLM-Based Dynamic Waypoint Generation
Muhammad Taha Tariq
Congqing Wang
Yasir Hussain
182
1
0
28 Jan 2025
ReasVQA: Advancing VideoQA with Imperfect Reasoning Process
ReasVQA: Advancing VideoQA with Imperfect Reasoning Process
Jianxin Liang
Xiaojun Meng
Huishuai Zhang
Yijiao Wang
Jiansheng Wei
Dongyan Zhao
LRM
77
2
0
23 Jan 2025
LAMS: LLM-Driven Automatic Mode Switching for Assistive Teleoperation
LAMS: LLM-Driven Automatic Mode Switching for Assistive Teleoperation
Yiran Tao
Jehan Yang
Dan Ding
Zackory Erickson
100
1
0
15 Jan 2025
RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation
RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation
Zixuan Chen
Jing Huo
Yangtao Chen
Yang Gao
164
4
0
11 Jan 2025
Using Pre-trained LLMs for Multivariate Time Series Forecasting
Using Pre-trained LLMs for Multivariate Time Series Forecasting
Malcolm Wolff
Shenghao Yang
Kari Torkkola
Michael W. Mahoney
AI4TSAIFin
87
2
0
10 Jan 2025
Mathematical Language Models: A Survey
Mathematical Language Models: A Survey
Wen Liu
Hanglei Hu
Jie Zhou
Yuyang Ding
Junsong Li
...
Mengliang He
Qin Chen
Bo Jiang
Aimin Zhou
Liang He
LRM
245
14
0
03 Jan 2025
LLM+AL: Bridging Large Language Models and Action Languages for Complex Reasoning about Actions
Adam Ishay
Joohyung Lee
LRM
120
4
0
01 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
147
26
0
31 Dec 2024
Multi-Modal Grounded Planning and Efficient Replanning For Learning
  Embodied Agents with A Few Examples
Multi-Modal Grounded Planning and Efficient Replanning For Learning Embodied Agents with A Few Examples
Taewoong Kim
Byeonghwi Kim
Jonghyun Choi
LLMAGLM&Ro
102
1
0
23 Dec 2024
A Review of Multimodal Explainable Artificial Intelligence: Past,
  Present and Future
A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future
Shilin Sun
Wenbin An
Feng Tian
Fang Nan
Qidong Liu
Jing Liu
N. Shah
Ping Chen
176
6
0
18 Dec 2024
CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers
CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers
Dimitrios Mallis
Ahmet Serdar Karadeniz
Sebastian Cavada
Danila Rukhovich
Niki Maria Foteinopoulou
K. Cherenkova
Anis Kacem
Djamila Aouada
206
7
0
18 Dec 2024
DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
Tobias Braun
Mark Rothermel
Marcus Rohrbach
Anna Rohrbach
208
6
0
13 Dec 2024
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Arsha Nagrani
Ruotong Wang
Ramin Mehran
Rachel Hornung
N. B. Gundavarapu
...
Boqing Gong
Cordelia Schmid
Mikhail Sirotenko
Yukun Zhu
Tobias Weyand
188
8
0
12 Dec 2024
Language Model as Visual Explainer
Language Model as Visual Explainer
Xingyi Yang
Xinchao Wang
VLM
93
0
0
08 Dec 2024
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Kevin Qinghong Lin
Linjie Li
Difei Gao
Zhiyong Yang
Shiwei Wu
Zechen Bai
Weixian Lei
Lijuan Wang
Mike Zheng Shou
LLMAG
160
37
0
26 Nov 2024
I Can Tell What I am Doing: Toward Real-World Natural Language Grounding
  of Robot Experiences
I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences
Zihan Wang
Brian Liang
Varad Dhat
Zander Brumbaugh
Nick Walker
Ranjay Krishna
Maya Cakmak
121
5
0
20 Nov 2024
HourVideo: 1-Hour Video-Language Understanding
HourVideo: 1-Hour Video-Language Understanding
Keshigeyan Chandrasegaran
Agrim Gupta
Lea M. Hadzic
Taran Kota
Jimming He
Cristobal Eyzaguirre
Zane Durante
Pengfei Yu
Jiajun Wu
L. Fei-Fei
VLM
115
50
0
07 Nov 2024
Personalized Video Summarization by Multimodal Video Understanding
Personalized Video Summarization by Multimodal Video Understanding
Brian Chen
Xiangyuan Zhao
Yingnan Zhu
84
1
0
05 Nov 2024
Thinking Forward and Backward: Effective Backward Planning with Large
  Language Models
Thinking Forward and Backward: Effective Backward Planning with Large Language Models
Allen Z. Ren
Brian Ichter
Anirudha Majumdar
LLMAGLRM
163
0
0
04 Nov 2024
123456789
Next