Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.12943
Cited By
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
19 March 2024
Vidhi Jain
Maria Attarian
Nikhil J. Joshi
Ayzaan Wahid
Danny Driess
Quan Vuong
Pannag R Sanketi
P. Sermanet
Stefan Welker
Christine Chan
Igor Gilitschenski
Yonatan Bisk
Debidatta Dwibedi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers"
29 / 29 papers shown
Title
Learning Generalizable Robot Policy with Human Demonstration Video as a Prompt
Xiang Zhu
Yichen Liu
Hezhong Li
Jianyu Chen
88
0
0
27 May 2025
X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real
Prithwish Dan
Kushal Kedia
Angela Chao
Edward Weiyi Duan
Maximus Adrian Pace
Wei-Chiu Ma
Sanjiban Choudhury
77
0
0
11 May 2025
ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow
Changhe Chen
Quantao Yang
Xiaohao Xu
Nima Fazeli
Olov Andersson
70
0
0
02 May 2025
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
Chi-Hsi Kung
Frangil Ramirez
Juhyung Ha
Yi-Ting Chen
David J. Crandall
Yi-Hsuan Tsai
97
1
0
27 Mar 2025
Human2Robot: Learning Robot Actions from Paired Human-Robot Videos
Sicheng Xie
Haidong Cao
Zejia Weng
Zhen Xing
Shiwei Shen
Jiaqi Leng
Xipeng Qiu
Yanwei Fu
Zuxuan Wu
Yu Jiang
103
0
0
23 Feb 2025
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Jiange Yang
Haoyi Zhu
Yanjie Wang
Gangshan Wu
Tong He
Limin Wang
143
3
0
21 Nov 2024
One-Shot Imitation under Mismatched Execution
Kushal Kedia
Prithwish Dan
Sanjiban Choudhury
Maximus Adrian Pace
Sanjiban Choudhury
82
5
0
10 Sep 2024
R+X: Retrieval and Execution from Everyday Human Videos
Georgios Papagiannis
Norman Di Palo
Pietro Vitiello
Edward Johns
102
16
0
17 Jul 2024
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Jiaming Zhou
Teli Ma
Kun-Yu Lin
Ronghe Qiu
Zifan Wang
Junwei Liang
94
7
0
20 Jun 2024
Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search
Max Liu
Chan-Hung Yu
Wei-Hsu Lee
Cheng-Wei Hung
Yen-Chun Chen
Shao-Hua Sun
79
4
0
26 May 2024
VIEW: Visual Imitation Learning with Waypoints
Ananth Jonnavittula
Sagar Parekh
Dylan P. Losey
SSL
126
10
0
27 Apr 2024
HomeRobot: Open-Vocabulary Mobile Manipulation
Sriram Yenamandra
A. Ramachandran
Karmesh Yadav
Austin S. Wang
Mukul Khanna
...
Devendra Singh Chaplot
Dhruv Batra
Roozbeh Mottaghi
Yonatan Bisk
Chris Paxton
LM&Ro
88
82
0
20 Jun 2023
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
131
1,131
0
27 Mar 2023
Open-World Object Manipulation using Pre-trained Vision-Language Models
Austin Stone
Ted Xiao
Yao Lu
K. Gopalakrishnan
Kuang-Huei Lee
...
Sean Kirmani
Brianna Zitkovich
F. Xia
Chelsea Finn
Karol Hausman
LM&Ro
219
149
0
02 Mar 2023
Human-to-Robot Imitation in the Wild
Shikhar Bahl
Abhi Gupta
Deepak Pathak
73
168
0
19 Jul 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
344
3,515
0
29 Apr 2022
R3M: A Universal Visual Representation for Robot Manipulation
Suraj Nair
Aravind Rajeswaran
Vikash Kumar
Chelsea Finn
Abhi Gupta
LM&Ro
67
566
0
23 Mar 2022
Masked Visual Pre-training for Motor Control
Tete Xiao
Ilija Radosavovic
Trevor Darrell
Jitendra Malik
SSL
77
246
0
11 Mar 2022
BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning
Eric Jang
A. Irpan
Mohi Khansari
Daniel Kappler
F. Ebert
Corey Lynch
Sergey Levine
Chelsea Finn
LM&Ro
231
534
0
04 Feb 2022
Towards More Generalizable One-shot Visual Imitation Learning
Zhao Mandi
Fangchen Liu
Kimin Lee
Pieter Abbeel
57
61
0
26 Oct 2021
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Andrew Jaegle
Sebastian Borgeaud
Jean-Baptiste Alayrac
Carl Doersch
Catalin Ionescu
...
Olivier J. Hénaff
M. Botvinick
Andrew Zisserman
Oriol Vinyals
João Carreira
MLLM
VLM
GNN
52
574
0
30 Jul 2021
Goal-Conditioned Reinforcement Learning with Imagined Subgoals
Elliot Chane-Sane
Cordelia Schmid
Ivan Laptev
64
144
0
01 Jul 2021
Reinforcement Learning with Videos: Combining Offline Observations with Interaction
Karl Schmeckpeper
Oleh Rybkin
Kostas Daniilidis
Sergey Levine
Chelsea Finn
OffRL
65
105
0
12 Nov 2020
Transformers for One-Shot Visual Imitation
Sudeep Dasari
Abhinav Gupta
LM&Ro
70
93
0
11 Nov 2020
Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill Primitives
Oliver Groth
Chia-Man Hung
Andrea Vedaldi
Ingmar Posner
OCL
52
10
0
19 Mar 2020
AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos
Laura M. Smith
Nikita Dhawan
Marvin Zhang
Pieter Abbeel
Sergey Levine
125
158
0
10 Dec 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
105
1,199
0
07 Jun 2019
Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation
YuXuan Liu
Abhishek Gupta
Pieter Abbeel
Sergey Levine
98
380
0
11 Jul 2017
One-Shot Imitation Learning
Yan Duan
Marcin Andrychowicz
Bradly C. Stadie
Jonathan Ho
Jonas Schneider
Ilya Sutskever
Pieter Abbeel
Wojciech Zaremba
OffRL
77
685
0
21 Mar 2017
1