Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.05996
Cited By
Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals
8 July 2024
Moritz Reuss
Ömer Erdinç Yagmurlu
Fabian Wenzel
Rudolf Lioutikov
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals"
22 / 22 papers shown
Title
MTIL: Encoding Full History with Mamba for Temporal Imitation Learning
Yulin Zhou
Yuankai Lin
Fanzhe Peng
Jiahui Chen
Zhuang Zhou
Kaiji Huang
Hua Yang
Zhouping Yin
Mamba
11
0
0
18 May 2025
3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks
V. Bhat
Yu-Hsiang Lan
Prashanth Krishnamurthy
Ramesh Karri
Farshad Khorrami
57
0
0
09 May 2025
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions
Qingwen Bu
Yanting Yang
Jisong Cai
Shenyuan Gao
Guanghui Ren
Maoqing Yao
Ping Luo
Hongyang Li
191
2
0
09 May 2025
ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow
Changhe Chen
Quantao Yang
Xiaohao Xu
Nima Fazeli
Olov Andersson
31
0
0
02 May 2025
CIVIL: Causal and Intuitive Visual Imitation Learning
Yinlong Dai
Robert Ramirez Sanchez
Ryan Jeronimus
Shahabedin Sagheb
Cara M. Nunez
Heramb Nemlekar
Dylan P. Losey
76
1
0
24 Apr 2025
PANDORA: Diffusion Policy Learning for Dexterous Robotic Piano Playing
Yanjia Huang
Renjie Li
Zhengzhong Tu
VGen
65
0
0
17 Mar 2025
X-IL: Exploring the Design Space of Imitation Learning Policies
Xiaogang Jia
Atalay Donat
Xi Huang
Xuan Zhao
Denis Blessing
...
Han A. Wang
Hanyi Zhang
Qian Wang
Rudolf Lioutikov
Gerhard Neumann
94
1
0
20 Feb 2025
Towards Fusing Point Cloud and Visual Representations for Imitation Learning
Atalay Donat
Xiaogang Jia
Xi Huang
Aleksandar Taranovic
Denis Blessing
Ge Li
Hongyi Zhou
Hanyi Zhang
Rudolf Lioutikov
Gerhard Neumann
3DPC
SSL
83
1
0
20 Feb 2025
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control
Junjie Wen
Bo Li
Jinming Li
Zhibin Tang
Chaomin Shen
Feifei Feng
VLM
63
14
0
09 Feb 2025
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression
Junjie Wen
Minjie Zhu
Bo Li
Zhibin Tang
Jinming Li
...
Chengmeng Li
Xiaoyu Liu
Yaxin Peng
Chaomin Shen
Feifei Feng
98
16
0
04 Dec 2024
TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning
Ge Li
Dong Tian
Hongyi Zhou
Xinkai Jiang
Rudolf Lioutikov
Gerhard Neumann
OffRL
250
3
0
12 Oct 2024
Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation
Qingwen Bu
Hongyang Li
Li Chen
Jisong Cai
Jia Zeng
Heming Cui
Maoqing Yao
Yu Qiao
60
4
0
10 Oct 2024
Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation
Kun Wu
Yichen Zhu
Jinming Li
Junjie Wen
Ning Liu
Zhiyuan Xu
Qinru Qiu
48
4
0
27 Sep 2024
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation
Junjie Wen
Bo Li
Jinming Li
Minjie Zhu
Kun Wu
...
Ran Cheng
Chaomin Shen
Yaxin Peng
Feifei Feng
Jian Tang
LM&Ro
76
50
0
19 Sep 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
82
45
0
23 May 2024
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Zipeng Fu
Tony Zhao
Chelsea Finn
116
292
0
04 Jan 2024
Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement
N. Gkanatsios
Ayush Jain
Zhou Xian
Yunchu Zhang
C. Atkeson
Katerina Fragkiadaki
LM&Ro
98
31
0
27 Apr 2023
Open-World Object Manipulation using Pre-trained Vision-Language Models
Austin Stone
Ted Xiao
Yao Lu
K. Gopalakrishnan
Kuang-Huei Lee
...
Sean Kirmani
Brianna Zitkovich
F. Xia
Chelsea Finn
Karol Hausman
LM&Ro
156
145
0
02 Mar 2023
Real-World Robot Learning with Masked Visual Pre-training
Ilija Radosavovic
Tete Xiao
Stephen James
Pieter Abbeel
Jitendra Malik
Trevor Darrell
SSL
156
241
0
06 Oct 2022
Grounding Language with Visual Affordances over Unstructured Data
Oier Mees
Jessica Borja-Diaz
Wolfram Burgard
LM&Ro
121
108
0
04 Oct 2022
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
Dieter Fox
LM&Ro
169
465
0
12 Sep 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
322
7,481
0
11 Nov 2021
1