ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.00598
  4. Cited By
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
v1v2 (latest)

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

1 April 2022
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
Stefan Welker
F. Tombari
Aveek Purohit
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
    ReLMLRM
ArXiv (abs)PDFHTML

Papers citing "Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language"

50 / 438 papers shown
Title
Language to Rewards for Robotic Skill Synthesis
Language to Rewards for Robotic Skill Synthesis
Wenhao Yu
Nimrod Gileadi
Chuyuan Fu
Sean Kirmani
Kuang-Huei Lee
...
N. Heess
Dorsa Sadigh
Jie Tan
Yuval Tassa
F. Xia
LM&Ro
125
284
0
14 Jun 2023
AssistGPT: A General Multi-modal Assistant that can Plan, Execute,
  Inspect, and Learn
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Difei Gao
Lei Ji
Luowei Zhou
Kevin Lin
Joya Chen
Zihan Fan
Mike Zheng Shou
MLLM
141
76
0
14 Jun 2023
LiveChat: A Large-Scale Personalized Dialogue Dataset Automatically
  Constructed from Live Streaming
LiveChat: A Large-Scale Personalized Dialogue Dataset Automatically Constructed from Live Streaming
Jingsheng Gao
Yixin Lian
Ziyi Zhou
Yuzhuo Fu
Baoyuan Wang
98
19
0
14 Jun 2023
SayTap: Language to Quadrupedal Locomotion
SayTap: Language to Quadrupedal Locomotion
Yujin Tang
Wenhao Yu
Jie Tan
Heiga Zen
Aleksandra Faust
Tatsuya Harada
108
43
0
13 Jun 2023
Embodied Executable Policy Learning with Language-based Scene
  Summarization
Embodied Executable Policy Learning with Language-based Scene Summarization
Jielin Qiu
Mengdi Xu
William Jongwon Han
Seungwhan Moon
Ding Zhao
LM&Ro
86
8
0
09 Jun 2023
Modular Visual Question Answering via Code Generation
Modular Visual Question Answering via Code Generation
Sanjay Subramanian
Medhini Narasimhan
Kushal Khangaonkar
Kevin Kaichuang Yang
Arsha Nagrani
Cordelia Schmid
Andy Zeng
Trevor Darrell
Dan Klein
77
51
0
08 Jun 2023
Deductive Verification of Chain-of-Thought Reasoning
Deductive Verification of Chain-of-Thought Reasoning
Z. Ling
Yunhao Fang
Xuanlin Li
Zhiao Huang
Mingu Lee
Roland Memisevic
Hao Su
ReLMLRM
121
138
0
06 Jun 2023
Human-like Few-Shot Learning via Bayesian Reasoning over Natural
  Language
Human-like Few-Shot Learning via Bayesian Reasoning over Natural Language
Kevin Ellis
BDLLRM
93
16
0
05 Jun 2023
MetaVL: Transferring In-Context Learning Ability From Language Models to
  Vision-Language Models
MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models
Masoud Monajatipoor
Liunian Harold Li
Mozhdeh Rouhsedaghat
Lin F. Yang
Kai-Wei Chang
MLLMLRM
80
14
0
02 Jun 2023
Reimagining Retrieval Augmented Language Models for Answering Queries
Reimagining Retrieval Augmented Language Models for Answering Queries
W. Tan
Yuliang Li
Pedro Rodriguez
Rich James
Xi Lin
A. Halevy
Scott Yih
KELMLRM
109
9
0
01 Jun 2023
LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented
  Language Model Prompting
LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting
R. Ramos
Bruno Martins
Desmond Elliott
VLM
83
16
0
31 May 2023
Enhanced Chart Understanding in Vision and Language Task via Cross-modal
  Pre-training on Plot Table Pairs
Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs
Mingyang Zhou
Yi R. Fung
Long Chen
Christopher Thomas
Heng Ji
Shih-Fu Chang
120
13
0
29 May 2023
Demo2Code: From Summarizing Demonstrations to Synthesizing Code via
  Extended Chain-of-Thought
Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought
Huaxiaoyue Wang
Gonzalo Gonzalez-Pumariega
Yash Sharma
Sanjiban Choudhury
LM&Ro
136
35
0
26 May 2023
Voyager: An Open-Ended Embodied Agent with Large Language Models
Voyager: An Open-Ended Embodied Agent with Large Language Models
Guanzhi Wang
Yuqi Xie
Yunfan Jiang
Ajay Mandlekar
Chaowei Xiao
Yuke Zhu
Linxi Fan
Anima Anandkumar
LM&RoSyDa
197
844
0
25 May 2023
The Art of SOCRATIC QUESTIONING: Recursive Thinking with Large Language
  Models
The Art of SOCRATIC QUESTIONING: Recursive Thinking with Large Language Models
Jingyuan Qi
Zhiyang Xu
Ying Shen
Minqian Liu
dingnan jin
Qifan Wang
Lifu Huang
ReLMLRMKELM
63
13
0
24 May 2023
Improving Factuality and Reasoning in Language Models through Multiagent
  Debate
Improving Factuality and Reasoning in Language Models through Multiagent Debate
Yilun Du
Shuang Li
Antonio Torralba
J. Tenenbaum
Igor Mordatch
LLMAGLRM
207
758
0
23 May 2023
Images in Language Space: Exploring the Suitability of Large Language
  Models for Vision & Language Tasks
Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks
Sherzod Hakimov
David Schlangen
VLM
73
5
0
23 May 2023
i-Code Studio: A Configurable and Composable Framework for Integrative
  AI
i-Code Studio: A Configurable and Composable Framework for Integrative AI
Yuwei Fang
Mahmoud Khademi
Chenguang Zhu
Ziyi Yang
Reid Pryzant
...
Yao Qian
Takuya Yoshioka
Lu Yuan
Michael Zeng
Xuedong Huang
94
2
0
23 May 2023
Album Storytelling with Iterative Story-aware Captioning and Large
  Language Models
Album Storytelling with Iterative Story-aware Captioning and Large Language Models
Munan Ning
Yujia Xie
Dongdong Chen
Zeyin Song
Lu Yuan
Yonghong Tian
QiXiang Ye
Liuliang Yuan
76
8
0
22 May 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
LM&Ro
195
104
0
19 May 2023
Semantic Anomaly Detection with Large Language Models
Semantic Anomaly Detection with Large Language Models
Amine Elhafsi
Rohan Sinha
Christopher Agia
Edward Schmerling
I. Nesnas
Marco Pavone
103
76
0
18 May 2023
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions
  with Large Language Model
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model
Siyuan Huang
Zhengkai Jiang
Hao Dong
Yu Qiao
Peng Gao
Hongsheng Li
LM&Ro
138
96
0
18 May 2023
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot
  Task Generalization
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
Puyuan Peng
Brian Yan
Shinji Watanabe
David Harwath
VLMLRM
121
49
0
18 May 2023
Learning to Reason over Scene Graphs: A Case Study of Finetuning GPT-2
  into a Robot Language Model for Grounded Task Planning
Learning to Reason over Scene Graphs: A Case Study of Finetuning GPT-2 into a Robot Language Model for Grounded Task Planning
Georgia Chalvatzaki
A. Younes
Daljeet Nandha
An T. Le
Leonardo F. R. Ribeiro
Iryna Gurevych
LM&RoLRMLLMAG
116
31
0
12 May 2023
TidyBot: Personalized Robot Assistance with Large Language Models
TidyBot: Personalized Robot Assistance with Large Language Models
Jimmy Wu
Rika Antonova
Adam Kan
Marion Lepert
Andy Zeng
Shuran Song
Jeannette Bohg
Szymon Rusinkiewicz
Thomas Funkhouser
LM&Ro
147
308
0
09 May 2023
Large Language Model Programs
Large Language Model Programs
Imanol Schlag
Sainbayar Sukhbaatar
Asli Celikyilmaz
Wen-tau Yih
Jason Weston
Jürgen Schmidhuber
Xian Li
LRM
96
15
0
09 May 2023
A Taxonomy of Foundation Model based Systems through the Lens of
  Software Architecture
A Taxonomy of Foundation Model based Systems through the Lens of Software Architecture
Qinghua Lu
Liming Zhu
Xiwei Xu
Yue Liu
Zhenchang Xing
Jon Whittle
115
12
0
09 May 2023
Read, Diagnose and Chat: Towards Explainable and Interactive
  LLMs-Augmented Depression Detection in Social Media
Read, Diagnose and Chat: Towards Explainable and Interactive LLMs-Augmented Depression Detection in Social Media
Wei Qin
Zetong Chen
Lei Wang
Yunshi Lan
Wei Ren
Richang Hong
AI4MH
97
21
0
09 May 2023
Automatic Prompt Optimization with "Gradient Descent" and Beam Search
Automatic Prompt Optimization with "Gradient Descent" and Beam Search
Reid Pryzant
Dan Iter
Jerry Li
Y. Lee
Chenguang Zhu
Michael Zeng
121
361
0
04 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALMLM&MA
328
638
0
03 May 2023
Multimodal Procedural Planning via Dual Text-Image Prompting
Multimodal Procedural Planning via Dual Text-Image Prompting
Yujie Lu
Pan Lu
Zhiyu Zoey Chen
Wanrong Zhu
Xinze Wang
William Yang Wang
LM&Ro
132
45
0
02 May 2023
Multimodal Grounding for Embodied AI via Augmented Reality Headsets for
  Natural Language Driven Task Planning
Multimodal Grounding for Embodied AI via Augmented Reality Headsets for Natural Language Driven Task Planning
Selma Wanna
Fabian Parra
R. Valner
Karl Kruusamäe
Mitch Pryor
LM&Ro
83
2
0
26 Apr 2023
From Association to Generation: Text-only Captioning by Unsupervised
  Cross-modal Mapping
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping
Junyan Wang
Ming Yan
Yi Zhang
Jitao Sang
CLIPVLM
79
9
0
26 Apr 2023
LLM as A Robotic Brain: Unifying Egocentric Memory and Control
Jinjie Mai
Jun Chen
Bing Li
Guocheng Qian
Mohamed Elhoseiny
Guohao Li
LM&Ro
138
35
0
19 Apr 2023
Tool Learning with Foundation Models
Tool Learning with Foundation Models
Yujia Qin
Shengding Hu
Yankai Lin
Weize Chen
Ning Ding
...
Cheng Yang
Tongshuang Wu
Heng Ji
Zhiyuan Liu
Maosong Sun
152
223
0
17 Apr 2023
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs
Minghao Li
Yingxiu Zhao
Yu Bowen
Feifan Song
Hangyu Li
Haiyang Yu
Zhoujun Li
Fei Huang
Yongbin Li
ELMRALMCLL
130
170
0
14 Apr 2023
FM-Loc: Using Foundation Models for Improved Vision-based Localization
FM-Loc: Using Foundation Models for Improved Vision-based Localization
Reihaneh Mirjalili
Michael Krawez
Wolfram Burgard
VLM
101
15
0
14 Apr 2023
Verbs in Action: Improving verb understanding in video-language models
Verbs in Action: Improving verb understanding in video-language models
Liliane Momeni
Mathilde Caron
Arsha Nagrani
Andrew Zisserman
Cordelia Schmid
117
73
0
13 Apr 2023
A Reference Architecture for Designing Foundation Model based Systems
A Reference Architecture for Designing Foundation Model based Systems
Qinghua Lu
Liming Zhu
Xiwei Xu
Zhenchang Xing
Jon Whittle
AI4TSAI4CE
66
2
0
13 Apr 2023
ChatGPT Empowered Long-Step Robot Control in Various Environments: A
  Case Application
ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application
Naoki Wake
Atsushi Kanehira
Kazuhiro Sasabuchi
Jun Takamatsu
Katsushi Ikeuchi
LM&Ro
90
85
0
08 Apr 2023
Object-centric Inference for Language Conditioned Placement: A
  Foundation Model based Approach
Object-centric Inference for Language Conditioned Placement: A Foundation Model based Approach
Zhi-Wei Xu
Kechun Xu
Yue Wang
R. Xiong
OCL
74
4
0
06 Apr 2023
VicTR: Video-conditioned Text Representations for Activity Recognition
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya
Anurag Arnab
Arsha Nagrani
Michael S. Ryoo
118
23
0
05 Apr 2023
Grounding Object Relations in Language-Conditioned Robotic Manipulation
  with Semantic-Spatial Reasoning
Grounding Object Relations in Language-Conditioned Robotic Manipulation with Semantic-Spatial Reasoning
Qian Luo
Yunfei Li
Yi Wu
LM&Ro
75
5
0
31 Mar 2023
Language Models can Solve Computer Tasks
Language Models can Solve Computer Tasks
Geunwoo Kim
Pierre Baldi
Stephen Marcus McAleer
LLMAGLM&Ro
172
375
0
30 Mar 2023
Text2Motion: From Natural Language Instructions to Feasible Plans
Text2Motion: From Natural Language Instructions to Feasible Plans
Kevin Qinghong Lin
Christopher Agia
Toki Migimatsu
Marco Pavone
Jeannette Bohg
LM&Ro
183
284
0
21 Mar 2023
eP-ALM: Efficient Perceptual Augmentation of Language Models
eP-ALM: Efficient Perceptual Augmentation of Language Models
Mustafa Shukor
Corentin Dancette
Matthieu Cord
MLLMVLM
74
31
0
20 Mar 2023
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
E. Azarnasab
Faisal Ahmed
Zicheng Liu
Ce Liu
Michael Zeng
Lijuan Wang
ReLMKELMLRM
137
399
0
20 Mar 2023
Retrieving Multimodal Information for Augmented Generation: A Survey
Retrieving Multimodal Information for Augmented Generation: A Survey
Ruochen Zhao
Hailin Chen
Weishi Wang
Fangkai Jiao
Do Xuan Long
...
Bosheng Ding
Xiaobao Guo
Minzhi Li
Xingxuan Li
Shafiq Joty
139
89
0
20 Mar 2023
Chat with the Environment: Interactive Multimodal Perception Using Large
  Language Models
Chat with the Environment: Interactive Multimodal Perception Using Large Language Models
Xufeng Zhao
Mengdi Li
C. Weber
Muhammad Burhan Hafez
S. Wermter
LLMAGLM&RoLRM
199
49
0
14 Mar 2023
ViperGPT: Visual Inference via Python Execution for Reasoning
ViperGPT: Visual Inference via Python Execution for Reasoning
Dídac Surís
Sachit Menon
Carl Vondrick
MLLMLRMReLM
146
471
0
14 Mar 2023
Previous
123456789
Next