ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.00598
  4. Cited By
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
v1v2 (latest)

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

1 April 2022
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
Stefan Welker
F. Tombari
Aveek Purohit
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
    ReLMLRM
ArXiv (abs)PDFHTML

Papers citing "Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language"

50 / 438 papers shown
Title
SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional
  Videos
SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos
Yulei Niu
Wenliang Guo
Long Chen
Xudong Lin
Shih-Fu Chang
100
12
0
03 Mar 2024
Learning with Language-Guided State Abstractions
Learning with Language-Guided State Abstractions
Andi Peng
Ilia Sucholutsky
Belinda Z. Li
T. Sumers
Thomas Griffiths
Jacob Andreas
Julie A. Shah
LM&Ro
96
14
0
28 Feb 2024
PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large
  Multimodal Models
PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large Multimodal Models
Dingkun Guo
Yuqi Xiang
Shuqi Zhao
Xinghao Zhu
Masayoshi Tomizuka
Mingyu Ding
Wei Zhan
95
11
0
26 Feb 2024
Language Agents as Optimizable Graphs
Language Agents as Optimizable Graphs
Mingchen Zhuge
Wenyi Wang
Louis Kirsch
Francesco Faccio
Dmitrii Khizbullin
Jürgen Schmidhuber
LLMAG
103
22
0
26 Feb 2024
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
Yao Mu
Junting Chen
Qinglong Zhang
Shoufa Chen
Qiaojun Yu
...
Wenhai Wang
Jifeng Dai
Yu Qiao
Mingyu Ding
Ping Luo
111
24
0
25 Feb 2024
Selective "Selective Prediction": Reducing Unnecessary Abstention in
  Vision-Language Reasoning
Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
Tejas Srinivasan
Jack Hessel
Tanmay Gupta
Bill Yuchen Lin
Yejin Choi
Jesse Thomason
Khyathi Chandu
87
9
0
23 Feb 2024
Safe Task Planning for Language-Instructed Multi-Robot Systems using
  Conformal Prediction
Safe Task Planning for Language-Instructed Multi-Robot Systems using Conformal Prediction
Jun Wang
Guocheng He
Y. Kantaros
105
13
0
23 Feb 2024
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
  and Simulation
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation
Junting Chen
Yao Mu
Qiaojun Yu
Tianming Wei
Silang Wu
...
Wenqi Shao
Yu Qiao
Huazhe Xu
Mingyu Ding
Ping Luo
LM&Ro
91
12
0
22 Feb 2024
Enhancing Robotic Manipulation with AI Feedback from Multimodal Large
  Language Models
Enhancing Robotic Manipulation with AI Feedback from Multimodal Large Language Models
Jinyi Liu
Yifu Yuan
Jianye Hao
Fei Ni
Lingzhi Fu
Yibin Chen
Yan Zheng
LM&Ro
410
6
0
22 Feb 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
139
36
0
20 Feb 2024
Modularized Networks for Few-shot Hateful Meme Detection
Modularized Networks for Few-shot Hateful Meme Detection
Rui Cao
Roy Ka-wei Lee
Jing Jiang
66
6
0
19 Feb 2024
Learning to Learn Faster from Human Feedback with Language Model
  Predictive Control
Learning to Learn Faster from Human Feedback with Language Model Predictive Control
Jacky Liang
Fei Xia
Wenhao Yu
Andy Zeng
Montse Gonzalez Arenas
...
N. Heess
Kanishka Rao
Nik Stewart
Jie Tan
Carolina Parada
LM&Ro
132
35
0
18 Feb 2024
Question-Instructed Visual Descriptions for Zero-Shot Video Question
  Answering
Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering
David Romero
Thamar Solorio
154
4
0
16 Feb 2024
BBSEA: An Exploration of Brain-Body Synchronization for Embodied Agents
BBSEA: An Exploration of Brain-Body Synchronization for Embodied Agents
Sizhe Yang
Qian Luo
Anumpam Pani
Yanchao Yang
83
2
0
13 Feb 2024
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany
Fei Xia
Wenhao Yu
Ted Xiao
Jacky Liang
...
Karol Hausman
N. Heess
Chelsea Finn
Sergey Levine
Brian Ichter
LM&RoLRM
98
115
0
12 Feb 2024
An Empirical Study Into What Matters for Calibrating Vision-Language
  Models
An Empirical Study Into What Matters for Calibrating Vision-Language Models
Weijie Tu
Weijian Deng
Dylan Campbell
Stephen Gould
Tom Gedeon
VLM
90
8
0
12 Feb 2024
TIC: Translate-Infer-Compile for accurate "text to plan" using LLMs and
  Logical Representations
TIC: Translate-Infer-Compile for accurate "text to plan" using LLMs and Logical Representations
Sudhir Agarwal
A. Sreepathy
126
1
0
09 Feb 2024
Memory Consolidation Enables Long-Context Video Understanding
Memory Consolidation Enables Long-Context Video Understanding
Ivana Balavzević
Yuge Shi
Pinelopi Papalampidi
Rahma Chaabouni
Skanda Koppula
Olivier J. Hénaff
200
28
0
08 Feb 2024
Real-World Robot Applications of Foundation Models: A Review
Real-World Robot Applications of Foundation Models: A Review
Kento Kawaharazuka
T. Matsushima
Andrew Gambardella
Jiaxian Guo
Chris Paxton
Andy Zeng
OffRLVLMLM&Ro
127
54
0
08 Feb 2024
S-Agents: Self-organizing Agents in Open-ended Environments
S-Agents: Self-organizing Agents in Open-ended Environments
Jia-Qing Chen
Yu-Gang Jiang
Jiachen Lu
Li Zhang
AIFinLLMAGLM&Ro
100
16
0
07 Feb 2024
"Task Success" is not Enough: Investigating the Use of Video-Language
  Models as Behavior Critics for Catching Undesirable Agent Behaviors
"Task Success" is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors
L. Guan
Yifan Zhou
Denis Liu
Yantian Zha
H. B. Amor
Subbarao Kambhampati
LM&Ro
94
17
0
06 Feb 2024
Preference-Conditioned Language-Guided Abstraction
Preference-Conditioned Language-Guided Abstraction
Andi Peng
Andreea Bobu
Belinda Z. Li
T. Sumers
Ilia Sucholutsky
Nishanth Kumar
Thomas Griffiths
Julie A. Shah
83
13
0
05 Feb 2024
LLM Agents in Interaction: Measuring Personality Consistency and
  Linguistic Alignment in Interacting Populations of Large Language Models
LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models
Ivar Frisch
Mario Giulianelli
79
11
0
05 Feb 2024
Weaver: Foundation Models for Creative Writing
Weaver: Foundation Models for Creative Writing
Tiannan Wang
Jiamin Chen
Qingrui Jia
Shuai Wang
Ruoyu Fang
...
Xiaohua Xu
Ningyu Zhang
Huajun Chen
Yuchen Eleanor Jiang
Wangchunshu Zhou
99
20
0
30 Jan 2024
Image-Text Out-Of-Context Detection Using Synthetic Multimodal
  Misinformation
Image-Text Out-Of-Context Detection Using Synthetic Multimodal Misinformation
Fatma Shalabi
H. Nguyen
Hichem Felouat
Ching-Chun Chang
Isao Echizen
102
5
0
29 Jan 2024
True Knowledge Comes from Practice: Aligning LLMs with Embodied
  Environments via Reinforcement Learning
True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning
Weihao Tan
Wentao Zhang
Shanqi Liu
Longtao Zheng
Xinrun Wang
Bo An
OffRL
109
22
0
25 Jan 2024
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning
  Capabilities
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Boyuan Chen
Zhuo Xu
Sean Kirmani
Brian Ichter
Danny Driess
Pete Florence
Dorsa Sadigh
Leonidas Guibas
Fei Xia
LRMReLM
127
272
0
22 Jan 2024
SocraSynth: Multi-LLM Reasoning with Conditional Statistics
SocraSynth: Multi-LLM Reasoning with Conditional Statistics
Edward Y. Chang
LLMAGLRM
92
8
0
19 Jan 2024
Exploring the Reasoning Abilities of Multimodal Large Language Models
  (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Yiqi Wang
Wentao Chen
Xiaotian Han
Xudong Lin
Haiteng Zhao
Yongfei Liu
Bohan Zhai
Jianbo Yuan
Quanzeng You
Hongxia Yang
LRM
128
88
0
10 Jan 2024
Large Language Models for Robotics: Opportunities, Challenges, and
  Perspectives
Large Language Models for Robotics: Opportunities, Challenges, and Perspectives
Jiaqi Wang
Zihao Wu
Yiwei Li
Hanqi Jiang
Peng Shu
...
Lin Zhao
Bao Ge
Xiang Li
Tianming Liu
Shu Zhang
LM&Ro
99
75
0
09 Jan 2024
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Tong Wu
Guandao Yang
Zhibing Li
Kai Zhang
Ziwei Liu
Leonidas Guibas
Dahua Lin
Gordon Wetzstein
EGVMVGen
150
96
0
08 Jan 2024
A Philosophical Introduction to Language Models -- Part I: Continuity
  With Classic Debates
A Philosophical Introduction to Language Models -- Part I: Continuity With Classic Debates
Raphael Milliere
Cameron Buckner
LRMELM
97
24
0
08 Jan 2024
LLM Augmented LLMs: Expanding Capabilities through Composition
LLM Augmented LLMs: Expanding Capabilities through Composition
Rachit Bansal
Bidisha Samanta
Siddharth Dalmia
Nitish Gupta
Shikhar Vashishth
Sriram Ganapathy
Abhishek Bapna
Prateek Jain
Partha P. Talukdar
CLL
85
38
0
04 Jan 2024
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via
  Text-Only Training
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Longtian Qiu
Shan Ning
Xuming He
VLM
86
4
0
04 Jan 2024
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as
  Programmers
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Aleksandar Stanić
Sergi Caelles
Michael Tschannen
LRMVLM
102
10
0
03 Jan 2024
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
224
100
0
29 Dec 2023
A Simple LLM Framework for Long-Range Video Question-Answering
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang
Taixi Lu
Md. Mohaiminul Islam
Ziyang Wang
Shoubin Yu
Mohit Bansal
Gedas Bertasius
199
92
0
28 Dec 2023
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large
  Multimodal and Language Models
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models
Bingbing Wen
Zhengyuan Yang
Jianfeng Wang
Zhe Gan
Bill Howe
Lijuan Wang
MLLM
72
1
0
21 Dec 2023
Social Learning: Towards Collaborative Learning with Large Language
  Models
Social Learning: Towards Collaborative Learning with Large Language Models
Amirkeivan Mohtashami
Florian Hartmann
Sian Gooding
Lukás Zilka
Matt Sharifi
Blaise Agüera y Arcas
87
12
0
18 Dec 2023
A Survey on Robotic Manipulation of Deformable Objects: Recent Advances,
  Open Challenges and New Frontiers
A Survey on Robotic Manipulation of Deformable Objects: Recent Advances, Open Challenges and New Frontiers
Feida Gu
Yanmin Zhou
Zhipeng Wang
Shuo Jiang
Bin He
AI4CE
94
8
0
16 Dec 2023
SMILE: Multimodal Dataset for Understanding Laughter in Video with
  Language Models
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
Lee Hyun
Kim Sung-Bin
Seungju Han
Youngjae Yu
Tae-Hyun Oh
103
15
0
15 Dec 2023
Foundation Models in Robotics: Applications, Challenges, and the Future
Foundation Models in Robotics: Applications, Challenges, and the Future
Roya Firoozi
Johnathan Tucker
Stephen Tian
Anirudha Majumdar
Jiankai Sun
...
Brian Ichter
Danny Driess
Jiajun Wu
Cewu Lu
Mac Schwager
LM&RoAI4CELRMVLM
114
161
0
13 Dec 2023
From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3"
From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3"
Takahide Yoshida
A. Masumori
Takashi Ikegami
89
18
0
11 Dec 2023
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Jinho Park
Jack Hessel
Khyathi Chandu
Paul Pu Liang
Ximing Lu
...
Youngjae Yu
Qiuyuan Huang
Jianfeng Gao
Ali Farhadi
Yejin Choi
VLM
84
12
0
08 Dec 2023
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form
  Egocentric Videos
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos
Ying Wang
Yanlai Yang
Mengye Ren
119
18
0
07 Dec 2023
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Chengshu Li
Jacky Liang
Andy Zeng
Xinyun Chen
Karol Hausman
Dorsa Sadigh
Sergey Levine
Fei-Fei Li
Fei Xia
Brian Ichter
LLMAGLRM
128
83
0
07 Dec 2023
LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language
  Model Programs
LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs
Yunsheng Ma
Can Cui
Xu Cao
Wenqian Ye
Peiran Liu
...
Rohit Gupta
Kyungtae Han
Aniket Bera
James M. Rehg
Ziran Wang
94
45
0
07 Dec 2023
FoMo Rewards: Can we cast foundation models as reward functions?
FoMo Rewards: Can we cast foundation models as reward functions?
Ekdeep Singh Lubana
Johann Brehmer
P. D. Haan
Taco S. Cohen
OffRLLRM
101
3
0
06 Dec 2023
Visual Program Distillation: Distilling Tools and Programmatic Reasoning
  into Vision-Language Models
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu
Otilia Stretcu
Chun-Ta Lu
Krishnamurthy Viswanathan
Kenji Hata
Enming Luo
Ranjay Krishna
Ariel Fuxman
VLMLRMMLLM
132
38
0
05 Dec 2023
Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language
  Models with Creative Humor Generation
Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation
Shan Zhong
Zhongzhan Huang
Shanghua Gao
Wushao Wen
Liang Lin
Marinka Zitnik
Pan Zhou
LLMAGLRM
132
40
0
05 Dec 2023
Previous
123456789
Next