Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.11134
Cited By
Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?
23 April 2022
Yuchen Cui
S. Niekum
Abhi Gupta
Vikash Kumar
Aravind Rajeswaran
LM&Ro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?"
50 / 63 papers shown
Title
ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations
Jiahui Zhang
Yusen Luo
Abrar Anwar
S. Sontakke
Joseph J. Lim
Jesse Thomason
Erdem Biyik
Jesse Zhang
OffRL
LM&Ro
17
0
0
16 May 2025
Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models
Chen Wang
Fei Xia
Wenhao Yu
Tingnan Zhang
Ruohan Zhang
Ce Liu
Li Fei-Fei
Jie Tan
Jacky Liang
33
0
0
17 Apr 2025
SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models
Cansu Sancaktar
Christian Gumbsch
Andrii Zadaianchuk
Pavel Kolev
Georg Martius
LM&Ro
VLM
61
1
0
03 Mar 2025
SB-Bench: Stereotype Bias Benchmark for Large Multimodal Models
Vishal Narnaware
Ashmal Vayani
Rohit Gupta
Swetha Sirnam
Mubarak Shah
108
3
0
12 Feb 2025
Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following
Vivek Myers
Bill Chunyuan Zheng
Anca Dragan
Kuan Fang
Sergey Levine
65
0
0
08 Feb 2025
Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization
Kento Kawaharazuka
Yoshiki Obinata
Naoaki Kanazawa
Kei Okada
Masayuki Inaba
36
0
0
30 Oct 2024
Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning
Yunpeng Gao
Zhigang Wang
Linglin Jing
Dong Wang
Xuelong Li
Bin Zhao
33
14
0
11 Oct 2024
Unity is Power: Semi-Asynchronous Collaborative Training of Large-Scale Models with Structured Pruning in Resource-Limited Clients
Yan Li
Mingyi Li
Xiao Zhang
Guangwei Xu
Feng Chen
Yuan Yuan
Yifei Zou
Mengying Zhao
Jianbo Lu
Dongxiao Yu
28
0
0
11 Oct 2024
Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress
Christopher Agia
Rohan Sinha
Jingyun Yang
Zi-ang Cao
Rika Antonova
Marco Pavone
Jeannette Bohg
28
7
0
06 Oct 2024
Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning
Jianxiong Li
Zhihao Wang
Jinliang Zheng
Xiaoai Zhou
Guanming Wang
...
Yu Liu
Jingjing Liu
Ya-Qin Zhang
Junzhi Yu
Xianyuan Zhan
38
2
0
02 Oct 2024
FlowRetrieval: Flow-Guided Data Retrieval for Few-Shot Imitation Learning
Li-Heng Lin
Yuchen Cui
Amber Xie
Tianyu Hua
Dorsa Sadigh
29
8
0
29 Aug 2024
Foundation Models for Autonomous Robots in Unstructured Environments
Hossein Naderi
Alireza Shojaei
Lifu Huang
LM&Ro
47
0
0
19 Jul 2024
Multimodal foundation world models for generalist embodied agents
Pietro Mazzaglia
Tim Verbelen
Bart Dhoedt
Aaron C. Courville
Sai Rajeswar
OffRL
LM&Ro
47
5
0
26 Jun 2024
Meta-Control: Automatic Model-based Control Synthesis for Heterogeneous Robot Skills
Tianhao Wei
Liqian Ma
Rui Chen
Weiye Zhao
Changliu Liu
45
3
0
18 May 2024
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
Dingzhe Li
Yixiang Jin
A. Yong
Hongze Yu
Jun Shi
Xiaoshuai Hao
Peng Hao
Huaping Liu
Gang Hua
Bin Fang
AI4CE
LM&Ro
72
13
0
28 Apr 2024
Retrieval-Augmented Embodied Agents
Yichen Zhu
Zhicai Ou
Xiaofeng Mou
Jian Tang
51
17
0
17 Apr 2024
A Roadmap Towards Automated and Regulated Robotic Systems
Yihao Liu
Mehran Armand
42
2
0
21 Mar 2024
CLIPSwarm: Generating Drone Shows from Text Prompts with Vision-Language Models
Pablo Pueyo
Eduardo Montijano
Ana C. Murillo
Mac Schwager
27
4
0
20 Mar 2024
CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundation Models
Haoxu Huang
Fanqi Lin
Yingdong Hu
Shengjie Wang
Yang Gao
38
49
0
13 Mar 2024
RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches
Priya Sundaresan
Q. Vuong
Jiayuan Gu
Peng-Tao Xu
Ted Xiao
...
Ajinkya Jain
Karol Hausman
Dorsa Sadigh
Jeannette Bohg
S. Schaal
VGen
29
25
0
05 Mar 2024
DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning
Jianxiong Li
Jinliang Zheng
Yinan Zheng
Liyuan Mao
Xiaoming Hu
...
Jihao Liu
Yu Liu
Jingjing Liu
Ya-Qin Zhang
Xianyuan Zhan
LM&Ro
OffRL
37
8
0
28 Feb 2024
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany
Fei Xia
Wenhao Yu
Ted Xiao
Jacky Liang
...
Karol Hausman
N. Heess
Chelsea Finn
Sergey Levine
Brian Ichter
LM&Ro
LRM
25
92
0
12 Feb 2024
Real-World Robot Applications of Foundation Models: A Review
Kento Kawaharazuka
T. Matsushima
Andrew Gambardella
Jiaxian Guo
Chris Paxton
Andy Zeng
OffRL
VLM
LM&Ro
48
45
0
08 Feb 2024
Code as Reward: Empowering Reinforcement Learning with VLMs
David Venuto
Sami Nur Islam
Martin Klissarov
Doina Precup
Sherry Yang
Ankit Anand
VLM
25
9
0
07 Feb 2024
"Task Success" is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors
L. Guan
Yifan Zhou
Denis Liu
Yantian Zha
H. B. Amor
Subbarao Kambhampati
LM&Ro
36
16
0
06 Feb 2024
RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback
Yufei Wang
Zhanyi Sun
Jesse Zhang
Zhou Xian
Erdem Biyik
David Held
Zackory M. Erickson
VLM
55
50
0
06 Feb 2024
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Boyuan Chen
Zhuo Xu
Sean Kirmani
Brian Ichter
Danny Driess
Pete Florence
Dorsa Sadigh
Leonidas J. Guibas
Fei Xia
LRM
ReLM
49
206
0
22 Jan 2024
Vision-Language Models as a Source of Rewards
Kate Baumli
Satinder Baveja
Feryal M. P. Behbahani
Harris Chan
Gheorghe Comanici
...
Yannick Schroecker
Stephen Spencer
Richie Steigerwald
Luyu Wang
Lei Zhang
VLM
LRM
42
26
0
14 Dec 2023
LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers
Taewook Nam
Juyong Lee
Jesse Zhang
Sung Ju Hwang
Joseph J. Lim
Karl Pertsch
OffRL
LRM
43
5
0
14 Dec 2023
Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis
Yafei Hu
Quanting Xie
Vidhi Jain
Jonathan M Francis
Jay Patrikar
...
Xiaolong Wang
Sebastian A. Scherer
Z. Kira
Fei Xia
Yonatan Bisk
LM&Ro
AI4CE
32
63
0
14 Dec 2023
DiffVL: Scaling Up Soft Body Manipulation using Vision-Language Driven Differentiable Physics
Zhiao Huang
Feng Chen
Yewen Pu
Chun-Tse Lin
Hao Su
Chuang Gan
24
4
0
11 Dec 2023
Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language Models
Ivan Kapelyukh
Yifei Ren
Ignacio Alzugaray
Edward Johns
VLM
LM&Ro
22
20
0
07 Dec 2023
FoMo Rewards: Can we cast foundation models as reward functions?
Ekdeep Singh Lubana
Johann Brehmer
P. D. Haan
Taco S. Cohen
OffRL
LRM
48
2
0
06 Dec 2023
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
Juan Rocamonde
Victoriano Montesinos
Elvis Nava
Ethan Perez
David Lindner
VLM
33
76
0
19 Oct 2023
Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Martin Klissarov
P. DÓro
Shagun Sodhani
Roberta Raileanu
Pierre-Luc Bacon
Pascal Vincent
Amy Zhang
Mikael Henaff
LRM
LLMAG
29
54
0
29 Sep 2023
OceanChat: Piloting Autonomous Underwater Vehicles in Natural Language
Jia Huang
Mengxue Hou
Junkai Wang
Fumin Zhang
34
5
0
27 Sep 2023
Verifiable Learned Behaviors via Motion Primitive Composition: Applications to Scooping of Granular Media
A. Benton
Eugen Solowjow
Prithvi Akella
14
0
0
26 Sep 2023
MUTEX: Learning Unified Policies from Multimodal Task Specifications
Rutav Shah
Roberto Martín-Martín
Yuke Zhu
OffRL
44
54
0
25 Sep 2023
Guide Your Agent with Adaptive Multimodal Rewards
Changyeon Kim
Younggyo Seo
Hao Liu
Lisa Lee
Jinwoo Shin
Honglak Lee
Kimin Lee
20
9
0
19 Sep 2023
Developmental Scaffolding with Large Language Models
Batuhan Celik
Alper Ahmetoglu
Emre Ugur
Erhan Öztop
LM&Ro
LLMAG
28
3
0
02 Sep 2023
Language Reward Modulation for Pretraining Reinforcement Learning
Ademi Adeniji
Amber Xie
Carmelo Sferrazza
Younggyo Seo
Stephen James
Pieter Abbeel
39
26
0
23 Aug 2023
Learning Navigational Visual Representations with Semantic Map Supervision
Yicong Hong
Yang Zhou
Ruiyi Zhang
Franck Dernoncourt
Trung Bui
Stephen Gould
Hao Tan
SSL
30
21
0
23 Jul 2023
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Wenlong Huang
Chen Wang
Ruohan Zhang
Yunzhu Li
Jiajun Wu
Li Fei-Fei
LM&Ro
33
480
0
12 Jul 2023
Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control
Vivek Myers
Andre Wang He
Kuan Fang
Homer Walke
Philippe Hansen-Estruch
Ching-An Cheng
Mihai Jalobeanu
Andrey Kolobov
Anca Dragan
Sergey Levine
LM&Ro
24
29
0
30 Jun 2023
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \
10,000 Budget; An Extra \
4,000 Unlocks 81.8% Accuracy
Xianhang Li
Zeyu Wang
Cihang Xie
CLIP
VLM
48
19
0
27 Jun 2023
LIV: Language-Image Representations and Rewards for Robotic Control
Yecheng Jason Ma
William Liang
Vaidehi Som
Vikash Kumar
Amy Zhang
Osbert Bastani
Dinesh Jayaraman
LM&Ro
33
121
0
01 Jun 2023
Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought
Huaxiaoyue Wang
Gonzalo Gonzalez-Pumariega
Yash Sharma
Sanjiban Choudhury
LM&Ro
26
33
0
26 May 2023
Semantic Anomaly Detection with Large Language Models
Amine Elhafsi
Rohan Sinha
Christopher Agia
Edward Schmerling
I. Nesnas
Marco Pavone
34
64
0
18 May 2023
An Inverse Scaling Law for CLIP Training
Xianhang Li
Zeyu Wang
Cihang Xie
VLM
CLIP
45
54
0
11 May 2023
Vision-Language Models as Success Detectors
Yuqing Du
Ksenia Konyushkova
Misha Denil
A. Raju
Jessica Landon
Felix Hill
Nando de Freitas
Serkan Cabi
MLLM
LRM
89
77
0
13 Mar 2023
1
2
Next