Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.00598
Cited By
v1
v2 (latest)
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
1 April 2022
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
Stefan Welker
F. Tombari
Aveek Purohit
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
ReLM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language"
50 / 438 papers shown
Title
TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos
Leonardo Plini
Luca Scofano
Edoardo De Matteis
Guido Maria DÁmely di Melendugno
Alessandro Flaborea
Andrea Sanchietti
G. Farinella
Fabio Galasso
Antonino Furnari
LRM
EgoV
141
2
0
04 Nov 2024
Multilingual Vision-Language Pre-training for the Remote Sensing Domain
João Daniel Silva
João Magalhães
D. Tuia
Bruno Martins
CLIP
VLM
85
2
0
30 Oct 2024
ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding
Kimihiro Hasegawa
Wiradee Imrattanatrai
Zhi-Qi Cheng
Masaki Asada
Susan Holm
Yuran Wang
Ken Fukuda
Teruko Mitamura
64
1
0
29 Oct 2024
SegLLM: Multi-round Reasoning Segmentation
XuDong Wang
Shaolun Zhang
Shufan Li
Konstantinos Kallidromitis
Kehan Li
Yusuke Kato
Kazuki Kozuka
Trevor Darrell
VLM
LRM
119
2
0
24 Oct 2024
Foundation Models for Rapid Autonomy Validation
Alec Farid
Peter Schleede
Aaron Huang
Christoffer Heckman
118
0
0
22 Oct 2024
In-Context Learning Enables Robot Action Prediction in LLMs
Yida Yin
Zekai Wang
Yuvan Sharma
Dantong Niu
Trevor Darrell
Roei Herzig
LM&Ro
303
4
0
16 Oct 2024
Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps
Han Wang
Yilin Zhao
Dian Li
Xiaohan Wang
Gang Liu
Xuguang Lan
Haoran Wang
LRM
203
1
0
14 Oct 2024
ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
Xinxin Zhao
Wenzhe Cai
Likun Tang
Teng Wang
LM&Ro
75
10
0
13 Oct 2024
Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos
Harsh Mahesheka
Zhixian Xie
Ziyi Wang
Wanxin Jin
86
0
0
11 Oct 2024
Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning
Yunpeng Gao
Zhigang Wang
Linglin Jing
Dong Wang
Xuelong Li
Bin Zhao
137
14
0
11 Oct 2024
ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution
Corban Rivera
Grayson Byrd
William Paul
Tyler Feldman
Meghan Booker
...
Krishna Murthy Jatavallabhula
Celso M. De Melo
Lalithkumar Seenivasan
Mathias Unberath
Rama Chellappa
LLMAG
LM&Ro
77
1
0
08 Oct 2024
LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation
Zhijie Wang
Zhehua Zhou
Jiayang Song
Yuheng Huang
Zhan Shu
Lei Ma
90
1
0
07 Oct 2024
Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!
Jiwan Chung
Seungwon Lim
Jaehyun Jeon
Seungbeen Lee
Youngjae Yu
108
1
0
01 Oct 2024
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Md. Mohaiminul Islam
Tushar Nagarajan
Huiyu Wang
Fu-Jen Chu
Kris Kitani
Gedas Bertasius
Xitong Yang
76
4
0
30 Sep 2024
Episodic Memory Verbalization using Hierarchical Representations of Life-Long Robot Experience
Leonard Barmann
Chad DeChant
Joana Plewnia
Fabian Peller-Konrad
Daniel Bauer
Tamim Asfour
Alex Waibel
LM&Ro
98
1
0
26 Sep 2024
Attention Prompting on Image for Large Vision-Language Models
Runpeng Yu
Weihao Yu
Xinchao Wang
VLM
106
11
0
25 Sep 2024
MHRC: Closed-loop Decentralized Multi-Heterogeneous Robot Collaboration with Large Language Models
Wenhao Yu
Jie Peng
Yueliang Ying
Sai Li
Jianmin Ji
Yanyong Zhang
124
6
0
24 Sep 2024
SYNERGAI: Perception Alignment for Human-Robot Collaboration
Yixin Chen
Guoxi Zhang
Yaowei Zhang
Hongming Xu
Peiyuan Zhi
Qing Li
Siyuan Huang
77
0
0
24 Sep 2024
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs
A. Mavrogiannis
Dehao Yuan
Yiannis Aloimonos
LM&Ro
93
0
0
23 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
162
2
0
19 Sep 2024
Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation
Cheng Charles Ma
Kevin Hyekang Joo
Alexandria K. Vail
Sunreeta Bhattacharya
Álvaro Fernández García
Kailana Baker-Matsuoka
Sheryl Mathew
Lori L. Holt
Fernando De la Torre
87
5
0
13 Sep 2024
HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers
Jianke Zhang
Yanjiang Guo
Xiaoyu Chen
Yen-Jen Wang
Yucheng Hu
Chengming Shi
Jianyu Chen
102
13
0
12 Sep 2024
Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments
Haritheja Etukuru
Norihito Naka
Zijin Hu
Seungjae Lee
Julian Mehu
Aaron Edsinger
Chris Paxton
Soumith Chintala
Lerrel Pinto
Nur Muhammad (Mahi) Shafiullah
LM&Ro
107
27
0
09 Sep 2024
Bridging the gap between natural user expression with complex automation programming in smart homes
Yingtian Shi
Xiaoyi Liu
Chun Yu
Tianao Yang
Cheng Gao
Chen Liang
Yuanchun Shi
68
2
0
22 Aug 2024
D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal models
Matteo Forlini
Mihail Babcinschi
Giacomo Palmieri
Pedro Neto
89
1
0
21 Aug 2024
VideoQA in the Era of LLMs: An Empirical Study
Junbin Xiao
Nanxin Huang
Hangyu Qin
Dongyang Li
Yicong Li
...
Zhulin Tao
Jianxing Yu
Liang Lin
Tat-Seng Chua
Angela Yao
106
14
0
08 Aug 2024
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Yanjie Wang
Alan Yuille
Zhuowan Li
Zilong Zheng
LRM
125
5
0
05 Aug 2024
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
Mrinal Verghese
Brian Chen
H. Eghbalzadeh
Tushar Nagarajan
Ruta Desai
LRM
101
1
0
04 Aug 2024
Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation
Jheng-Hong Yang
Jimmy Lin
VLM
85
3
0
02 Aug 2024
CityX: Controllable Procedural Content Generation for Unbounded 3D Cities
Shougao Zhang
Mengqi Zhou
Yuxi Wang
Chuanchen Luo
Rongyu Wang
Yiwei Li
Xucheng Yin
Zhaoxiang Zhang
Junran Peng
117
9
0
24 Jul 2024
Can VLMs be used on videos for action recognition? LLMs are Visual Reasoning Coordinators
Harsh Lunia
62
1
0
20 Jul 2024
BadRobot: Jailbreaking Embodied LLMs in the Physical World
Hangtao Zhang
Chenyu Zhu
Xianlong Wang
Ziqi Zhou
Yichen Wang
...
Shengshan Hu
Leo Yu Zhang
Aishan Liu
Peijin Guo
Leo Yu Zhang
LM&Ro
117
11
0
16 Jul 2024
Affordance-Guided Reinforcement Learning via Visual Prompting
Olivia Y. Lee
Annie Xie
Kuan Fang
Karl Pertsch
Chelsea Finn
OffRL
LM&Ro
218
10
0
14 Jul 2024
VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation
Wentao Zhao
Jiaming Chen
Ziyu Meng
Donghui Mao
Ran Song
Wei Zhang
127
12
0
13 Jul 2024
Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments
Zoya Volovikova
A. Skrynnik
Petr Kuderov
Aleksandr I. Panov
LLMAG
LM&Ro
99
1
0
12 Jul 2024
Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort
Jeeyung Kim
Ze Wang
Qiang Qiu
84
2
0
12 Jul 2024
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
Yang Liu
Weixing Chen
Yongjie Bai
Xiaodan Liang
Guanbin Li
Wen Gao
Liang Lin
LM&Ro
SyDa
AI4CE
168
71
0
09 Jul 2024
Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models
Chang-Sheng Kao
Yun-Nung Chen
56
0
0
04 Jul 2024
Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models
Annie S. Chen
Alec M. Lessing
Andy Tang
Govind Chada
Laura Smith
Sergey Levine
Chelsea Finn
LM&Ro
LRM
96
11
0
02 Jul 2024
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Khyathi Chandu
Linjie Li
Anas Awadalla
Ximing Lu
Jae Sung Park
Jack Hessel
Lijuan Wang
Yejin Choi
113
3
0
02 Jul 2024
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
Enshu Liu
Junyi Zhu
Zinan Lin
Xuefei Ning
Matthew B. Blaschko
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MoE
112
7
0
01 Jul 2024
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Xiang Li
Cristina Mata
J. Park
Kumara Kahatapitiya
Yoo Sung Jang
...
Kanchana Ranasinghe
R. Burgert
Mu Cai
Yong Jae Lee
Michael S. Ryoo
LM&Ro
196
32
0
28 Jun 2024
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning
Christopher E. Mower
Yuhui Wan
Hongzhan Yu
Antoine Grosnit
Jonas Gonzalez-Billandon
...
Kun Shao
Xingyue Quan
Jianye Hao
Jun Wang
Haitham Bou-Ammar
LM&Ro
LLMAG
78
13
0
28 Jun 2024
Tools Fail: Detecting Silent Errors in Faulty Tools
Jimin Sun
So Yeon Min
Yingshan Chang
Yonatan Bisk
108
7
0
27 Jun 2024
Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models
Georgios Tziafas
Hamidreza Kasaei
KELM
LM&Ro
107
9
0
26 Jun 2024
Towards Open-World Grasping with Large Vision-Language Models
Georgios Tziafas
Hamidreza Kasaei
LM&Ro
LRM
138
15
0
26 Jun 2024
Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps
Dicong Qiu
Wenzong Ma
Zhenfu Pan
Hui Xiong
Junwei Liang
LM&Ro
101
8
0
26 Jun 2024
Retrieval-Augmented Code Generation for Situated Action Generation: A Case Study on Minecraft
Chalamalasetti Kranti
Sherzod Hakimov
David Schlangen
90
3
0
25 Jun 2024
Adversaries Can Misuse Combinations of Safe Models
Erik Jones
Anca Dragan
Jacob Steinhardt
78
13
0
20 Jun 2024
Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events
M. Tami
Huthaifa I. Ashqar
Mohammed Elhenawy
100
5
0
19 Jun 2024
Previous
1
2
3
4
5
6
7
8
9
Next