ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.00598
  4. Cited By
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
v1v2 (latest)

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

1 April 2022
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
Stefan Welker
F. Tombari
Aveek Purohit
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
    ReLMLRM
ArXiv (abs)PDFHTML

Papers citing "Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language"

50 / 438 papers shown
Title
SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust
  Attention
SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention
Isabel Leal
Krzysztof Choromanski
Deepali Jain
Kumar Avinava Dubey
Jake Varley
...
Q. Vuong
Tamás Sarlós
Kenneth Oslund
Karol Hausman
Kanishka Rao
142
10
0
04 Dec 2023
LVDiffusor: Distilling Functional Rearrangement Priors from Large Models
  into Diffusor
LVDiffusor: Distilling Functional Rearrangement Priors from Large Models into Diffusor
Yiming Zeng
Mingdong Wu
Long Yang
Jiyao Zhang
Hao Ding
Hui Cheng
Hao Dong
DiffM
88
8
0
03 Dec 2023
Zero-Shot Video Question Answering with Procedural Programs
Zero-Shot Video Question Answering with Procedural Programs
Rohan Choudhury
Koichiro Niinuma
Kris M. Kitani
László A. Jeni
86
24
0
01 Dec 2023
Look Before You Leap: Unveiling the Power of GPT-4V in Robotic
  Vision-Language Planning
Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning
Yingdong Hu
Fanqi Lin
Tong Zhang
Li Yi
Yang Gao
LM&Ro
179
125
0
29 Nov 2023
PALM: Predicting Actions through Language Models
PALM: Predicting Actions through Language Models
Sanghwan Kim
Daoji Huang
Yongqin Xian
Otmar Hilliges
Luc Van Gool
Xi Wang
VLM
90
14
0
29 Nov 2023
ROSO: Improving Robotic Policy Inference via Synthetic Observations
ROSO: Improving Robotic Policy Inference via Synthetic Observations
Yusuke Miyashita
Dimitris Gahtidis
Colin La
Jeremy Rabinowicz
Juxi Leitner
58
2
0
28 Nov 2023
RoboGPT: an intelligent agent of making embodied long-term decisions for
  daily instruction tasks
RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks
Yaran Chen
Wenbo Cui
Yuanwen Chen
Mining Tan
Xinyao Zhang
Dong Zhao
He Wang
LM&RoLLMAG
87
0
0
27 Nov 2023
Vamos: Versatile Action Models for Video Understanding
Vamos: Versatile Action Models for Video Understanding
Shijie Wang
Qi Zhao
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
155
21
0
22 Nov 2023
GAIA: a benchmark for General AI Assistants
GAIA: a benchmark for General AI Assistants
Grégoire Mialon
Clémentine Fourrier
Craig Swift
Thomas Wolf
Yann LeCun
Thomas Scialom
AI4MHALMELMRALM
116
188
0
21 Nov 2023
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback
Minghe Gao
Juncheng Li
Hao Fei
Liang Pang
Wei Ji
Guoming Wang
Wenqiao Zhang
Siliang Tang
Yueting Zhuang
83
9
0
21 Nov 2023
A Survey on Multimodal Large Language Models for Autonomous Driving
A Survey on Multimodal Large Language Models for Autonomous Driving
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
...
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
181
292
0
21 Nov 2023
GPT-4V(ision) for Robotics: Multimodal Task Planning from Human
  Demonstration
GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration
Naoki Wake
Atsushi Kanehira
Kazuhiro Sasabuchi
Jun Takamatsu
Katsushi Ikeuchi
LM&Ro
99
69
0
20 Nov 2023
Visual AI and Linguistic Intelligence Through Steerability and
  Composability
Visual AI and Linguistic Intelligence Through Steerability and Composability
David Noever
S. M. Noever
75
0
0
18 Nov 2023
Challenges in data-based geospatial modeling for environmental research
  and practice
Challenges in data-based geospatial modeling for environmental research and practice
Diana Koldasbayeva
P. Tregubova
M. Gasanov
Alexey Zaytsev
Anna Petrovskaia
Evgeny Burnaev
AI4CE
80
1
0
18 Nov 2023
Distilling and Retrieving Generalizable Knowledge for Robot Manipulation
  via Language Corrections
Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections
Lihan Zha
Yuchen Cui
Li-Heng Lin
Minae Kwon
Montse Gonzalez Arenas
Andy Zeng
Fei Xia
Dorsa Sadigh
113
37
0
17 Nov 2023
VideoCon: Robust Video-Language Alignment via Contrast Captions
VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal
Yonatan Bitton
Idan Szpektor
Kai-Wei Chang
Aditya Grover
69
18
0
15 Nov 2023
I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in
  Social Robots
I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots
Giulio Antonio Abbo
Tony Belpaeme
81
1
0
15 Nov 2023
Zero-shot audio captioning with audio-language model guidance and audio
  context keywords
Zero-shot audio captioning with audio-language model guidance and audio context keywords
Leonard Salewski
Stefan Fauth
A. Sophia Koepke
Zeynep Akata
54
11
0
14 Nov 2023
Human-Centric Autonomous Systems With LLMs for User Command Reasoning
Human-Centric Autonomous Systems With LLMs for User Command Reasoning
Yi Yang
Qingwen Zhang
Ci Li
Daniel Simoes Marta
Nazre Batool
John Folkesson
LRM
127
30
0
14 Nov 2023
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal
  Language Models
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
Zihao Wang
Shaofei Cai
Hoang Trung-Dung
Yonggang Jin
Jinbing Hou
...
Zhaofeng He
Zilong Zheng
Yaodong Yang
Xiaojian Ma
Yitao Liang
LLMAGLM&Ro
136
108
0
10 Nov 2023
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shangwen Wang
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
121
7
0
10 Nov 2023
Follow-Up Differential Descriptions: Language Models Resolve Ambiguities
  for Image Classification
Follow-Up Differential Descriptions: Language Models Resolve Ambiguities for Image Classification
Reza Esfandiarpoor
Stephen H. Bach
VLM
97
13
0
10 Nov 2023
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in
  Clutter
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter
Georgios Tziafas
Yucheng Xu
Arushi Goel
Mohammadreza Kasaei
Zhibin Li
Hamidreza Kasaei
95
28
0
09 Nov 2023
Zero-shot Translation of Attention Patterns in VQA Models to Natural
  Language
Zero-shot Translation of Attention Patterns in VQA Models to Natural Language
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
77
2
0
08 Nov 2023
Multitask Multimodal Prompted Training for Interactive Embodied Task
  Completion
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
Georgios Pantazopoulos
Malvina Nikandrou
Amit Parekh
Bhathiya Hemanthage
Arash Eshghi
Ioannis Konstas
Verena Rieser
Oliver Lemon
Alessandro Suglia
LM&Ro
82
7
0
07 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
99
67
0
07 Nov 2023
Get the Ball Rolling: Alerting Autonomous Robots When to Help to Close
  the Healthcare Loop
Get the Ball Rolling: Alerting Autonomous Robots When to Help to Close the Healthcare Loop
Jiaxin Shen
Yanyao Liu
Ziming Wang
Ziyuan Jiao
Yufeng Chen
Wenjuan Han
39
0
0
05 Nov 2023
Make a Donut: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation with Tools
Make a Donut: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation with Tools
Yang You
Bokui Shen
Congyue Deng
Haoran Geng
Songlin Wei
He Wang
Leonidas Guibas
84
3
0
05 Nov 2023
Sentiment Analysis through LLM Negotiations
Sentiment Analysis through LLM Negotiations
Xiaofei Sun
Xiaoya Li
Shengyu Zhang
Shuhe Wang
Leilei Gan
Jiwei Li
Tianwei Zhang
Guoyin Wang
95
21
0
03 Nov 2023
A Graph-to-Text Approach to Knowledge-Grounded Response Generation in Human-Robot Interaction
A Graph-to-Text Approach to Knowledge-Grounded Response Generation in Human-Robot Interaction
Nicholas Walker
Stefan Ultes
Pierre Lison
LM&Ro
173
1
0
03 Nov 2023
Long Story Short: a Summarize-then-Search Method for Long Video Question
  Answering
Long Story Short: a Summarize-then-Search Method for Long Video Question Answering
Jiwan Chung
Youngjae Yu
189
5
0
02 Nov 2023
Is GPT Powerful Enough to Analyze the Emotions of Memes?
Is GPT Powerful Enough to Analyze the Emotions of Memes?
Jingjing Wang
Joshua Luo
Grace Yang
Allen Hong
Feng Luo
ELMAI4MH
69
2
0
01 Nov 2023
Large Language Models as Generalizable Policies for Embodied Tasks
Large Language Models as Generalizable Policies for Embodied Tasks
Andrew Szot
Max Schwarzer
Harsh Agrawal
Bogdan Mazoure
Walter A. Talbott
Katherine Metcalf
Natalie Mackraz
Devon Hjelm
Alexander Toshev
LM&Ro
97
67
0
26 Oct 2023
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Daniela Ben-David
Tzuf Paz-Argaman
Reut Tsarfaty
MoE
75
0
0
25 Oct 2023
Woodpecker: Hallucination Correction for Multimodal Large Language
  Models
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Tong Xu
Hao Wang
Dianbo Sui
Chunjiang Ge
Ke Li
Xingguo Sun
Enhong Chen
VLMMLLM
108
133
0
24 Oct 2023
Unnatural language processing: How do language models handle
  machine-generated prompts?
Unnatural language processing: How do language models handle machine-generated prompts?
Corentin Kervadec
Francesca Franzon
Marco Baroni
79
6
0
24 Oct 2023
Large Language Models are Visual Reasoning Coordinators
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen
Bo Li
Sheng Shen
Jingkang Yang
Chunyuan Li
Kurt Keutzer
Trevor Darrell
Ziwei Liu
VLMLRM
143
58
0
23 Oct 2023
Open-Ended Instructable Embodied Agents with Memory-Augmented Large
  Language Models
Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models
Gabriel H. Sarch
Yue Wu
Michael J. Tarr
Katerina Fragkiadaki
LM&RoLLMAG
123
19
0
23 Oct 2023
HallusionBench: An Advanced Diagnostic Suite for Entangled Language
  Hallucination and Visual Illusion in Large Vision-Language Models
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan
Fuxiao Liu
Xiyang Wu
Ruiqi Xian
Zongxia Li
...
Lichang Chen
Furong Huang
Yaser Yacoob
Dinesh Manocha
Dinesh Manocha
VLMMLLM
180
197
0
23 Oct 2023
Can Language Models Laugh at YouTube Short-form Videos?
Can Language Models Laugh at YouTube Short-form Videos?
Dayoon Ko
Sangho Lee
Gunhee Kim
125
8
0
22 Oct 2023
3D-GPT: Procedural 3D Modeling with Large Language Models
3D-GPT: Procedural 3D Modeling with Large Language Models
Chunyi Sun
Junlin Han
Weijian Deng
Xinlong Wang
Zishan Qin
Stephen Gould
105
43
0
19 Oct 2023
Language Models as Zero-Shot Trajectory Generators
Language Models as Zero-Shot Trajectory Generators
Teyun Kwon
Norman Di Palo
Edward Johns
LM&Ro
109
51
0
17 Oct 2023
Video Language Planning
Video Language Planning
Yilun Du
Mengjiao Yang
Peter R. Florence
Fei Xia
Ayzaan Wahid
...
Pieter Abbeel
Josh Tenenbaum
L. Kaelbling
Andy Zeng
Jonathan Tompson
PINNLM&Ro
192
100
0
16 Oct 2023
Interpreting and Controlling Vision Foundation Models via Text
  Explanations
Interpreting and Controlling Vision Foundation Models via Text Explanations
Haozhe Chen
Junfeng Yang
Carl Vondrick
Chengzhi Mao
93
3
0
16 Oct 2023
VidCoM: Fast Video Comprehension through Large Language Models with
  Multimodal Tools
VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools
Huihui Gong
Minjing Dong
Siqi Ma
S. Çamtepe
Chang Xu
Lei Hou
Surya Nepal
VLMMLLM
102
0
0
16 Oct 2023
Reading Books is Great, But Not if You Are Driving! Visually Grounded
  Reasoning about Defeasible Commonsense Norms
Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms
Seungju Han
Junhyeok Kim
Jack Hessel
Liwei Jiang
Jiwan Chung
Yejin Son
Yejin Choi
Youngjae Yu
70
3
0
16 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and
  Outlook
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TSSyDa
174
125
0
16 Oct 2023
Interactive Task Planning with Language Models
Interactive Task Planning with Language Models
Boyi Li
Philipp Wu
Pieter Abbeel
Jitendra Malik
LM&Ro
126
38
0
16 Oct 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
79
2
0
15 Oct 2023
Vision-by-Language for Training-Free Compositional Image Retrieval
Vision-by-Language for Training-Free Compositional Image Retrieval
Shyamgopal Karthik
Karsten Roth
Massimiliano Mancini
Zeynep Akata
CoGe
121
61
0
13 Oct 2023
Previous
123456789
Next