Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.08693
Cited By
Robotic Control via Embodied Chain-of-Thought Reasoning
11 July 2024
Michał Zawalski
William Chen
Karl Pertsch
Oier Mees
Chelsea Finn
Sergey Levine
LRM
LM&Ro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Robotic Control via Embodied Chain-of-Thought Reasoning"
50 / 50 papers shown
Title
Unveiling the Potential of Vision-Language-Action Models with Open-Ended Multimodal Instructions
Wei Zhao
Gongsheng Li
Zhefei Gong
Pengxiang Ding
Han Zhao
Donglin Wang
LM&Ro
22
0
0
16 May 2025
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
Yifu Yuan
Haiqin Cui
Yibin Chen
Zibin Dong
Fei Ni
Longxin Kou
Jinyi Liu
Pengyi Li
Yan Zheng
Jianye Hao
31
0
0
13 May 2025
Pixel Motion as Universal Representation for Robot Control
Kanchana Ranasinghe
Xiang Li
Cristina Mata
J. Park
Michael S. Ryoo
VGen
32
0
0
12 May 2025
3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks
V. Bhat
Yu-Hsiang Lan
Prashanth Krishnamurthy
Ramesh Karri
Farshad Khorrami
52
0
0
09 May 2025
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Ranjan Sapkota
Yang Cao
Konstantinos I Roumeliotis
Manoj Karkee
LM&Ro
179
1
0
07 May 2025
PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications
Trisanth Srinivasan
Santosh Patapati
39
0
0
03 May 2025
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
Chia-Yu Hung
Qi Sun
Pengfei Hong
Amir Zadeh
Chuan Li
U-Xuan Tan
Navonil Majumder
Soujanya Poria
LM&Ro
42
1
0
28 Apr 2025
Robotic Task Ambiguity Resolution via Natural Language Interaction
Eugenio Chisari
Jan Ole von Hartz
Fabien Despinoy
Abhinav Valada
LM&Ro
49
0
0
24 Apr 2025
π
0.5
π_{0.5}
π
0.5
: a Vision-Language-Action Model with Open-World Generalization
Physical Intelligence
Kevin Black
Noah Brown
James Darpinian
Karan Dhabalia
...
Homer Walke
Anna Walling
Haohuan Wang
Lili Yu
Ury Zhilinsky
LM&Ro
VLM
39
12
0
22 Apr 2025
Advancing Embodied Agent Security: From Safety Benchmarks to Input Moderation
Ning Wang
Zihan Yan
W. Li
Chuan Ma
H. Chen
Tao Xiang
AAML
45
0
0
22 Apr 2025
Towards Forceful Robotic Foundation Models: a Literature Survey
William Xie
N. Correll
OffRL
65
1
0
16 Apr 2025
Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning
Ram Ramrakhya
Matthew Chang
Xavier Puig
Ruta Desai
Z. Kira
Roozbeh Mottaghi
LLMAG
LM&Ro
66
0
0
01 Apr 2025
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
Qingqing Zhao
Yao Lu
Moo Jin Kim
Zipeng Fu
Zhuoyang Zhang
...
Ankur Handa
Xuan Li
Donglai Xiang
Gordon Wetzstein
Nayeon Lee
LM&Ro
LRM
51
13
0
27 Mar 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Wenbo Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Yueting Zhuang
LM&Ro
LRM
75
3
0
27 Mar 2025
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Nvidia
A. Azzolini
Junjie Bai
Prithvijit Chattopadhyay
Huayu Chen
...
Xiaodong Yang
Zhuolin Yang
Jun Zhang
Xiaohui Zeng
Zhe Zhang
AI4CE
LM&Ro
LRM
62
5
0
18 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Yansen Wang
Shengqiong Wu
Yuyao Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
92
9
0
16 Mar 2025
Towards Fast, Memory-based and Data-Efficient Vision-Language Policy
Haoxuan Li
Sixu Yan
Yong Li
Xinggang Wang
LM&Ro
64
0
0
13 Mar 2025
iManip: Skill-Incremental Learning for Robotic Manipulation
Zexin Zheng
Jia-Feng Cai
Xiao-Ming Wu
Yi-Lin Wei
Yu-Ming Tang
Wei-Shi Zheng
CLL
54
0
0
10 Mar 2025
PointVLA: Injecting the 3D World into Vision-Language-Action Models
Chengmeng Li
Junjie Wen
Yan Peng
Yaxin Peng
Feifei Feng
Yichen Zhu
3DPC
73
3
0
10 Mar 2025
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning
Borong Zhang
Yuhao Zhang
Yalan Qin
Yingshan Lei
Josef Dai
Yuanpei Chen
Yaodong Yang
66
4
0
05 Mar 2025
Subtask-Aware Visual Reward Learning from Segmented Demonstrations
Changyeon Kim
Minho Heo
Doohyun Lee
Jinwoo Shin
Honglak Lee
Joseph J. Lim
Kimin Lee
44
1
0
28 Feb 2025
ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration
Minjie Zhu
Yichen Zhu
Jinming Li
Zhongyi Zhou
Junjie Wen
Xiaoyu Liu
Chaomin Shen
Yaxin Peng
Feifei Feng
LM&Ro
89
4
0
26 Feb 2025
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models
Lucy Xiaoyang Shi
Brian Ichter
Michael Equi
Liyiming Ke
Karl Pertsch
...
Adrian Li-Bell
Danny Driess
Lachy Groom
Sergey Levine
Chelsea Finn
LM&Ro
LRM
95
8
0
26 Feb 2025
A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards
Shivansh Patel
Xinchen Yin
Wenlong Huang
Shubham Garg
H. Nayyeri
Li Fei-Fei
Svetlana Lazebnik
Yong Li
92
0
0
12 Feb 2025
Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models
Xinghang Li
Peiyan Li
Minghuan Liu
Dong Wang
Jirong Liu
Bingyi Kang
Xiao Ma
Tao Kong
Hanbo Zhang
Huaping Liu
LM&Ro
99
18
0
18 Dec 2024
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression
Junjie Wen
Minjie Zhu
Yichen Zhu
Zhibin Tang
Jinming Li
...
Chengmeng Li
Xiaoyu Liu
Yaxin Peng
Chaomin Shen
Feifei Feng
88
15
0
04 Dec 2024
CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision
Gi-Cheon Kang
Junghyun Kim
Kyuhwan Shim
Jun Ki Lee
Byoung-Tak Zhang
LM&Ro
107
1
1
01 Nov 2024
GHIL-Glue: Hierarchical Control with Filtered Subgoal Images
Kyle Hatch
Ashwin Balakrishna
Oier Mees
Suraj Nair
Seohong Park
...
Masha Itkina
Benjamin Eysenbach
Sergey Levine
Thomas Kollar
Benjamin Burchfiel
65
2
0
26 Oct 2024
Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models
Nils Blank
Moritz Reuss
Marcel Rühle
Ömer Erdinç Yagmurlu
Fabian Wenzel
Oier Mees
Rudolf Lioutikov
LM&Ro
OffRL
29
4
0
23 Oct 2024
Just Add Force for Contact-Rich Robot Policies
William Xie
Stefan Caldararu
N. Correll
OffRL
32
0
0
17 Oct 2024
Latent Action Pretraining from Videos
Seonghyeon Ye
Joel Jang
Byeongguk Jeon
Sejune Joo
Jianwei Yang
...
Kimin Lee
J. Gao
Luke Zettlemoyer
Dieter Fox
Minjoon Seo
35
28
0
15 Oct 2024
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation
Junjie Wen
Yichen Zhu
Jinming Li
Minjie Zhu
Kun Wu
...
Ran Cheng
Chaomin Shen
Yaxin Peng
Feifei Feng
Jian Tang
LM&Ro
74
48
0
19 Sep 2024
AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots
Zhaxizhuoma
Pengan Chen
Ziniu Wu
Jiawei Sun
Dong Wang
Peng Zhou
Nieqing Cao
Yan Ding
Bin Zhao
Xuelong Li
46
4
0
18 Sep 2024
MotIF: Motion Instruction Fine-tuning
Minyoung Hwang
Joey Hejna
Dorsa Sadigh
Yonatan Bisk
57
1
0
16 Sep 2024
Seeing Through Their Eyes: Evaluating Visual Perspective Taking in Vision Language Models
Gracjan Góral
Alicja Ziarko
Michal Nauman
Maciej Wołczyk
LRM
33
1
0
02 Sep 2024
Autonomous Improvement of Instruction Following Skills via Foundation Models
Zhiyuan Zhou
P. Atreya
Abraham Lee
Homer Walke
Oier Mees
Sergey Levine
32
11
0
30 Jul 2024
OpenVLA: An Open-Source Vision-Language-Action Model
Moo Jin Kim
Karl Pertsch
Siddharth Karamcheti
Ted Xiao
Ashwin Balakrishna
...
Russ Tedrake
Dorsa Sadigh
Sergey Levine
Percy Liang
Chelsea Finn
LM&Ro
VLM
51
367
0
13 Jun 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
82
43
0
23 May 2024
Closed Loop Interactive Embodied Reasoning for Robot Manipulation
Michal Nazarczuk
Jan Kristof Behrens
Karla Stepanova
Matej Hoffmann
K. Mikolajczyk
LM&Ro
LRM
55
1
0
23 Apr 2024
Yell At Your Robot: Improving On-the-Fly from Language Corrections
Lucy Xiaoyang Shi
Zheyuan Hu
Tony Zhao
Archit Sharma
Karl Pertsch
Jianlan Luo
Sergey Levine
Chelsea Finn
LM&Ro
87
63
0
19 Mar 2024
Vision-Language Models Provide Promptable Representations for Reinforcement Learning
William Chen
Oier Mees
Aviral Kumar
Sergey Levine
VLM
LM&Ro
44
23
0
05 Feb 2024
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Zipeng Fu
Tony Zhao
Chelsea Finn
116
289
0
04 Jan 2024
Audio Visual Language Maps for Robot Navigation
Chen Huang
Oier Mees
Andy Zeng
Wolfram Burgard
VGen
73
33
0
13 Mar 2023
Open-World Object Manipulation using Pre-trained Vision-Language Models
Austin Stone
Ted Xiao
Yao Lu
K. Gopalakrishnan
Kuang-Huei Lee
...
Sean Kirmani
Brianna Zitkovich
F. Xia
Chelsea Finn
Karol Hausman
LM&Ro
156
145
0
02 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
290
4,261
0
30 Jan 2023
Visual Language Maps for Robot Navigation
Chen Huang
Oier Mees
Andy Zeng
Wolfram Burgard
LM&Ro
162
345
0
11 Oct 2022
Grounding Language with Visual Affordances over Unstructured Data
Oier Mees
Jessica Borja-Diaz
Wolfram Burgard
LM&Ro
121
108
0
04 Oct 2022
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
Dieter Fox
LM&Ro
166
460
0
12 Sep 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
398
8,559
0
28 Jan 2022
Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets
F. Ebert
Yanlai Yang
Karl Schmeckpeper
Bernadette Bucher
G. Georgakis
Kostas Daniilidis
Chelsea Finn
Sergey Levine
169
220
0
27 Sep 2021
1