ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.11134
  4. Cited By
Can Foundation Models Perform Zero-Shot Task Specification For Robot
  Manipulation?

Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?

23 April 2022
Yuchen Cui
S. Niekum
Abhi Gupta
Vikash Kumar
Aravind Rajeswaran
    LM&Ro
ArXivPDFHTML

Papers citing "Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?"

50 / 63 papers shown
Title
ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations
ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations
Jiahui Zhang
Yusen Luo
Abrar Anwar
S. Sontakke
Joseph J. Lim
Jesse Thomason
Erdem Biyik
Jesse Zhang
OffRL
LM&Ro
17
0
0
16 May 2025
Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models
Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models
Chen Wang
Fei Xia
Wenhao Yu
Tingnan Zhang
Ruohan Zhang
Ce Liu
Li Fei-Fei
Jie Tan
Jacky Liang
33
0
0
17 Apr 2025
SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models
Cansu Sancaktar
Christian Gumbsch
Andrii Zadaianchuk
Pavel Kolev
Georg Martius
LM&Ro
VLM
61
1
0
03 Mar 2025
SB-Bench: Stereotype Bias Benchmark for Large Multimodal Models
SB-Bench: Stereotype Bias Benchmark for Large Multimodal Models
Vishal Narnaware
Ashmal Vayani
Rohit Gupta
Swetha Sirnam
Mubarak Shah
108
3
0
12 Feb 2025
Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following
Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following
Vivek Myers
Bill Chunyuan Zheng
Anca Dragan
Kuan Fang
Sergey Levine
65
0
0
08 Feb 2025
Robotic State Recognition with Image-to-Text Retrieval Task of
  Pre-Trained Vision-Language Model and Black-Box Optimization
Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization
Kento Kawaharazuka
Yoshiki Obinata
Naoaki Kanazawa
Kei Okada
Masayuki Inaba
36
0
0
30 Oct 2024
Aerial Vision-and-Language Navigation via Semantic-Topo-Metric
  Representation Guided LLM Reasoning
Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning
Yunpeng Gao
Zhigang Wang
Linglin Jing
Dong Wang
Xuelong Li
Bin Zhao
33
14
0
11 Oct 2024
Unity is Power: Semi-Asynchronous Collaborative Training of Large-Scale
  Models with Structured Pruning in Resource-Limited Clients
Unity is Power: Semi-Asynchronous Collaborative Training of Large-Scale Models with Structured Pruning in Resource-Limited Clients
Yan Li
Mingyi Li
Xiao Zhang
Guangwei Xu
Feng Chen
Yuan Yuan
Yifei Zou
Mengying Zhao
Jianbo Lu
Dongxiao Yu
28
0
0
11 Oct 2024
Unpacking Failure Modes of Generative Policies: Runtime Monitoring of
  Consistency and Progress
Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress
Christopher Agia
Rohan Sinha
Jingyun Yang
Zi-ang Cao
Rika Antonova
Marco Pavone
Jeannette Bohg
28
7
0
06 Oct 2024
Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning
Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning
Jianxiong Li
Zhihao Wang
Jinliang Zheng
Xiaoai Zhou
Guanming Wang
...
Yu Liu
Jingjing Liu
Ya-Qin Zhang
Junzhi Yu
Xianyuan Zhan
38
2
0
02 Oct 2024
FlowRetrieval: Flow-Guided Data Retrieval for Few-Shot Imitation
  Learning
FlowRetrieval: Flow-Guided Data Retrieval for Few-Shot Imitation Learning
Li-Heng Lin
Yuchen Cui
Amber Xie
Tianyu Hua
Dorsa Sadigh
29
8
0
29 Aug 2024
Foundation Models for Autonomous Robots in Unstructured Environments
Foundation Models for Autonomous Robots in Unstructured Environments
Hossein Naderi
Alireza Shojaei
Lifu Huang
LM&Ro
47
0
0
19 Jul 2024
Multimodal foundation world models for generalist embodied agents
Multimodal foundation world models for generalist embodied agents
Pietro Mazzaglia
Tim Verbelen
Bart Dhoedt
Aaron C. Courville
Sai Rajeswar
OffRL
LM&Ro
47
5
0
26 Jun 2024
Meta-Control: Automatic Model-based Control Synthesis for Heterogeneous
  Robot Skills
Meta-Control: Automatic Model-based Control Synthesis for Heterogeneous Robot Skills
Tianhao Wei
Liqian Ma
Rui Chen
Weiye Zhao
Changliu Liu
45
3
0
18 May 2024
What Foundation Models can Bring for Robot Learning in Manipulation : A
  Survey
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
Dingzhe Li
Yixiang Jin
A. Yong
Hongze Yu
Jun Shi
Xiaoshuai Hao
Peng Hao
Huaping Liu
Gang Hua
Bin Fang
AI4CE
LM&Ro
72
13
0
28 Apr 2024
Retrieval-Augmented Embodied Agents
Retrieval-Augmented Embodied Agents
Yichen Zhu
Zhicai Ou
Xiaofeng Mou
Jian Tang
51
17
0
17 Apr 2024
A Roadmap Towards Automated and Regulated Robotic Systems
A Roadmap Towards Automated and Regulated Robotic Systems
Yihao Liu
Mehran Armand
42
2
0
21 Mar 2024
CLIPSwarm: Generating Drone Shows from Text Prompts with Vision-Language
  Models
CLIPSwarm: Generating Drone Shows from Text Prompts with Vision-Language Models
Pablo Pueyo
Eduardo Montijano
Ana C. Murillo
Mac Schwager
27
4
0
20 Mar 2024
CoPa: General Robotic Manipulation through Spatial Constraints of Parts
  with Foundation Models
CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundation Models
Haoxu Huang
Fanqi Lin
Yingdong Hu
Shengjie Wang
Yang Gao
38
49
0
13 Mar 2024
RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches
RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches
Priya Sundaresan
Q. Vuong
Jiayuan Gu
Peng-Tao Xu
Ted Xiao
...
Ajinkya Jain
Karol Hausman
Dorsa Sadigh
Jeannette Bohg
S. Schaal
VGen
29
25
0
05 Mar 2024
DecisionNCE: Embodied Multimodal Representations via Implicit Preference
  Learning
DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning
Jianxiong Li
Jinliang Zheng
Yinan Zheng
Liyuan Mao
Xiaoming Hu
...
Jihao Liu
Yu Liu
Jingjing Liu
Ya-Qin Zhang
Xianyuan Zhan
LM&Ro
OffRL
37
8
0
28 Feb 2024
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany
Fei Xia
Wenhao Yu
Ted Xiao
Jacky Liang
...
Karol Hausman
N. Heess
Chelsea Finn
Sergey Levine
Brian Ichter
LM&Ro
LRM
25
92
0
12 Feb 2024
Real-World Robot Applications of Foundation Models: A Review
Real-World Robot Applications of Foundation Models: A Review
Kento Kawaharazuka
T. Matsushima
Andrew Gambardella
Jiaxian Guo
Chris Paxton
Andy Zeng
OffRL
VLM
LM&Ro
48
45
0
08 Feb 2024
Code as Reward: Empowering Reinforcement Learning with VLMs
Code as Reward: Empowering Reinforcement Learning with VLMs
David Venuto
Sami Nur Islam
Martin Klissarov
Doina Precup
Sherry Yang
Ankit Anand
VLM
25
9
0
07 Feb 2024
"Task Success" is not Enough: Investigating the Use of Video-Language
  Models as Behavior Critics for Catching Undesirable Agent Behaviors
"Task Success" is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors
L. Guan
Yifan Zhou
Denis Liu
Yantian Zha
H. B. Amor
Subbarao Kambhampati
LM&Ro
36
16
0
06 Feb 2024
RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model
  Feedback
RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback
Yufei Wang
Zhanyi Sun
Jesse Zhang
Zhou Xian
Erdem Biyik
David Held
Zackory M. Erickson
VLM
55
50
0
06 Feb 2024
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning
  Capabilities
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Boyuan Chen
Zhuo Xu
Sean Kirmani
Brian Ichter
Danny Driess
Pete Florence
Dorsa Sadigh
Leonidas J. Guibas
Fei Xia
LRM
ReLM
49
206
0
22 Jan 2024
Vision-Language Models as a Source of Rewards
Vision-Language Models as a Source of Rewards
Kate Baumli
Satinder Baveja
Feryal M. P. Behbahani
Harris Chan
Gheorghe Comanici
...
Yannick Schroecker
Stephen Spencer
Richie Steigerwald
Luyu Wang
Lei Zhang
VLM
LRM
42
26
0
14 Dec 2023
LiFT: Unsupervised Reinforcement Learning with Foundation Models as
  Teachers
LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers
Taewook Nam
Juyong Lee
Jesse Zhang
Sung Ju Hwang
Joseph J. Lim
Karl Pertsch
OffRL
LRM
43
5
0
14 Dec 2023
Toward General-Purpose Robots via Foundation Models: A Survey and
  Meta-Analysis
Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis
Yafei Hu
Quanting Xie
Vidhi Jain
Jonathan M Francis
Jay Patrikar
...
Xiaolong Wang
Sebastian A. Scherer
Z. Kira
Fei Xia
Yonatan Bisk
LM&Ro
AI4CE
32
63
0
14 Dec 2023
DiffVL: Scaling Up Soft Body Manipulation using Vision-Language Driven
  Differentiable Physics
DiffVL: Scaling Up Soft Body Manipulation using Vision-Language Driven Differentiable Physics
Zhiao Huang
Feng Chen
Yewen Pu
Chun-Tse Lin
Hao Su
Chuang Gan
24
4
0
11 Dec 2023
Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language
  Models
Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language Models
Ivan Kapelyukh
Yifei Ren
Ignacio Alzugaray
Edward Johns
VLM
LM&Ro
22
20
0
07 Dec 2023
FoMo Rewards: Can we cast foundation models as reward functions?
FoMo Rewards: Can we cast foundation models as reward functions?
Ekdeep Singh Lubana
Johann Brehmer
P. D. Haan
Taco S. Cohen
OffRL
LRM
48
2
0
06 Dec 2023
Vision-Language Models are Zero-Shot Reward Models for Reinforcement
  Learning
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
Juan Rocamonde
Victoriano Montesinos
Elvis Nava
Ethan Perez
David Lindner
VLM
33
76
0
19 Oct 2023
Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Martin Klissarov
P. DÓro
Shagun Sodhani
Roberta Raileanu
Pierre-Luc Bacon
Pascal Vincent
Amy Zhang
Mikael Henaff
LRM
LLMAG
29
54
0
29 Sep 2023
OceanChat: Piloting Autonomous Underwater Vehicles in Natural Language
OceanChat: Piloting Autonomous Underwater Vehicles in Natural Language
Jia Huang
Mengxue Hou
Junkai Wang
Fumin Zhang
34
5
0
27 Sep 2023
Verifiable Learned Behaviors via Motion Primitive Composition:
  Applications to Scooping of Granular Media
Verifiable Learned Behaviors via Motion Primitive Composition: Applications to Scooping of Granular Media
A. Benton
Eugen Solowjow
Prithvi Akella
14
0
0
26 Sep 2023
MUTEX: Learning Unified Policies from Multimodal Task Specifications
MUTEX: Learning Unified Policies from Multimodal Task Specifications
Rutav Shah
Roberto Martín-Martín
Yuke Zhu
OffRL
44
54
0
25 Sep 2023
Guide Your Agent with Adaptive Multimodal Rewards
Guide Your Agent with Adaptive Multimodal Rewards
Changyeon Kim
Younggyo Seo
Hao Liu
Lisa Lee
Jinwoo Shin
Honglak Lee
Kimin Lee
20
9
0
19 Sep 2023
Developmental Scaffolding with Large Language Models
Developmental Scaffolding with Large Language Models
Batuhan Celik
Alper Ahmetoglu
Emre Ugur
Erhan Öztop
LM&Ro
LLMAG
28
3
0
02 Sep 2023
Language Reward Modulation for Pretraining Reinforcement Learning
Language Reward Modulation for Pretraining Reinforcement Learning
Ademi Adeniji
Amber Xie
Carmelo Sferrazza
Younggyo Seo
Stephen James
Pieter Abbeel
39
26
0
23 Aug 2023
Learning Navigational Visual Representations with Semantic Map
  Supervision
Learning Navigational Visual Representations with Semantic Map Supervision
Yicong Hong
Yang Zhou
Ruiyi Zhang
Franck Dernoncourt
Trung Bui
Stephen Gould
Hao Tan
SSL
30
21
0
23 Jul 2023
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with
  Language Models
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Wenlong Huang
Chen Wang
Ruohan Zhang
Yunzhu Li
Jiajun Wu
Li Fei-Fei
LM&Ro
33
480
0
12 Jul 2023
Goal Representations for Instruction Following: A Semi-Supervised
  Language Interface to Control
Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control
Vivek Myers
Andre Wang He
Kuan Fang
Homer Walke
Philippe Hansen-Estruch
Ching-An Cheng
Mihai Jalobeanu
Andrey Kolobov
Anca Dragan
Sergey Levine
LM&Ro
24
29
0
30 Jun 2023
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy
  within a \$10,000 Budget; An Extra \$4,000 Unlocks 81.8% Accuracy
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \10,000 Budget; An Extra \4,000 Unlocks 81.8% Accuracy
Xianhang Li
Zeyu Wang
Cihang Xie
CLIP
VLM
48
19
0
27 Jun 2023
LIV: Language-Image Representations and Rewards for Robotic Control
LIV: Language-Image Representations and Rewards for Robotic Control
Yecheng Jason Ma
William Liang
Vaidehi Som
Vikash Kumar
Amy Zhang
Osbert Bastani
Dinesh Jayaraman
LM&Ro
33
121
0
01 Jun 2023
Demo2Code: From Summarizing Demonstrations to Synthesizing Code via
  Extended Chain-of-Thought
Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought
Huaxiaoyue Wang
Gonzalo Gonzalez-Pumariega
Yash Sharma
Sanjiban Choudhury
LM&Ro
26
33
0
26 May 2023
Semantic Anomaly Detection with Large Language Models
Semantic Anomaly Detection with Large Language Models
Amine Elhafsi
Rohan Sinha
Christopher Agia
Edward Schmerling
I. Nesnas
Marco Pavone
34
64
0
18 May 2023
An Inverse Scaling Law for CLIP Training
An Inverse Scaling Law for CLIP Training
Xianhang Li
Zeyu Wang
Cihang Xie
VLM
CLIP
45
54
0
11 May 2023
Vision-Language Models as Success Detectors
Vision-Language Models as Success Detectors
Yuqing Du
Ksenia Konyushkova
Misha Denil
A. Raju
Jessica Landon
Felix Hill
Nando de Freitas
Serkan Cabi
MLLM
LRM
89
77
0
13 Mar 2023
12
Next