Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.01288
Cited By
v1
v2 (latest)
ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow
2 May 2025
Changhe Chen
Quantao Yang
Xiaohao Xu
Nima Fazeli
Olov Andersson
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow"
26 / 26 papers shown
Title
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
Yang Tian
Sizhe Yang
Jia Zeng
P. Wang
Dahua Lin
Hao Dong
Jiangmiao Pang
148
21
0
19 Dec 2024
Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion
Kaizhe Hu
Zihang Rui
Yao He
Yuyao Liu
Pu Hua
Huazhe Xu
90
1
0
07 Nov 2024
CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos
Nikita Karaev
Iurii Makarov
Jianyuan Wang
Natalia Neverova
Andrea Vedaldi
Christian Rupprecht
59
68
0
15 Oct 2024
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy
Peiyan Li
Hongtao Wu
Yan Huang
Chilam Cheang
Liang Wang
Tao Kong
VGen
86
13
0
26 Aug 2024
Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals
Moritz Reuss
Ömer Erdinç Yagmurlu
Fabian Wenzel
Rudolf Lioutikov
OffRL
98
51
0
08 Jul 2024
Learning Manipulation by Predicting Interaction
Jia Zeng
Qingwen Bu
Bangjun Wang
Wenke Xia
Li Chen
...
Heming Cui
Bin Zhao
Xuelong Li
Yu Qiao
Hongyang Li
107
26
0
01 Jun 2024
Imitation Learning: A Survey of Learning Methods, Environments and Metrics
Nathan Gavenski
Odinaldo Rodrigues
Michael Luck
69
134
0
30 Apr 2024
VIEW: Visual Imitation Learning with Waypoints
Ananth Jonnavittula
Sagar Parekh
Dylan P. Losey
SSL
150
11
0
27 Apr 2024
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
Vidhi Jain
Maria Attarian
Nikhil J. Joshi
Ayzaan Wahid
Danny Driess
...
Stefan Welker
Christine Chan
Igor Gilitschenski
Yonatan Bisk
Debidatta Dwibedi
122
32
0
19 Mar 2024
PRIME: Scaffolding Manipulation Tasks with Behavior Primitives for Data-Efficient Imitation Learning
Tian Gao
Soroush Nasiriany
Huihan Liu
Quantao Yang
Yuke Zhu
99
8
0
01 Mar 2024
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation
Hongtao Wu
Ya Jing
Chi-Hou Cheang
Guangzeng Chen
Jiafeng Xu
Xinghang Li
Minghuan Liu
Hang Li
Tao Kong
123
111
0
20 Dec 2023
Visual Hindsight Self-Imitation Learning for Interactive Navigation
Kibeom Kim
Kisung Shin
Min Whoo Lee
Moonhoen Lee
Minsu Lee
Byoung-Tak Zhang
73
2
0
05 Dec 2023
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models
Kevin Black
Mitsuhiko Nakamoto
P. Atreya
Homer Walke
Chelsea Finn
Aviral Kumar
Sergey Levine
DiffM
LM&Ro
112
142
0
16 Oct 2023
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Anthony Brohan
Noah Brown
Justice Carbajal
Yevgen Chebotar
Xi Chen
...
Ted Xiao
Peng Xu
Sichun Xu
Tianhe Yu
Brianna Zitkovich
LM&Ro
LRM
179
1,291
0
28 Jul 2023
Language-Conditioned Imitation Learning with Base Skill Priors under Unstructured Data
Hongkuan Zhou
Zhenshan Bing
Xiangtong Yao
Xiaojie Su
Chenguang Yang
Kai-Qi Huang
Alois C. Knoll
LM&Ro
71
20
0
30 May 2023
Segment Anything
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
371
7,405
0
05 Apr 2023
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Cheng Chi
Zhenjia Xu
S. Feng
Eric A. Cousineau
Yilun Du
Benjamin Burchfiel
Russ Tedrake
Shuran Song
349
1,231
0
07 Mar 2023
K-VIL: Keypoints-based Visual Imitation Learning
Jianfeng Gao
Z. Tao
Noémie Jaquier
Tamim Asfour
VGen
SSL
81
26
0
07 Sep 2022
Human-to-Robot Imitation in the Wild
Shikhar Bahl
Abhi Gupta
Deepak Pathak
97
173
0
19 Jul 2022
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
Bowen Baker
Ilge Akkaya
Peter Zhokhov
Joost Huizinga
Jie Tang
Adrien Ecoffet
Brandon Houghton
Raul Sampedro
Jeff Clune
OffRL
132
303
0
23 Jun 2022
What Matters in Language Conditioned Robotic Imitation Learning over Unstructured Data
Oier Mees
Lukás Hermann
Wolfram Burgard
LM&Ro
107
155
0
13 Apr 2022
CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Oier Mees
Lukás Hermann
Erick Rosete-Beas
Wolfram Burgard
LM&Ro
113
263
0
06 Dec 2021
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
477
7,827
0
11 Nov 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
978
29,871
0
26 Feb 2021
The "something something" video database for learning and evaluating visual common sense
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
...
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
VLM
101
1,542
0
13 Jun 2017
Time-Contrastive Networks: Self-Supervised Learning from Video
P. Sermanet
Corey Lynch
Yevgen Chebotar
Jasmine Hsu
Eric Jang
S. Schaal
Sergey Levine
SSL
107
830
0
23 Apr 2017
1