ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.01288
  4. Cited By
ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow
v1v2 (latest)

ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow

2 May 2025
Changhe Chen
Quantao Yang
Xiaohao Xu
Nima Fazeli
Olov Andersson
ArXiv (abs)PDFHTML

Papers citing "ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow"

26 / 26 papers shown
Title
Predictive Inverse Dynamics Models are Scalable Learners for Robotic
  Manipulation
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
Yang Tian
Sizhe Yang
Jia Zeng
P. Wang
Dahua Lin
Hao Dong
Jiangmiao Pang
148
21
0
19 Dec 2024
Stem-OB: Generalizable Visual Imitation Learning with Stem-Like
  Convergent Observation through Diffusion Inversion
Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion
Kaizhe Hu
Zihang Rui
Yao He
Yuyao Liu
Pu Hua
Huazhe Xu
90
1
0
07 Nov 2024
CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real
  Videos
CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos
Nikita Karaev
Iurii Makarov
Jianyuan Wang
Natalia Neverova
Andrea Vedaldi
Christian Rupprecht
59
68
0
15 Oct 2024
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal
  Conditioned Policy
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy
Peiyan Li
Hongtao Wu
Yan Huang
Chilam Cheang
Liang Wang
Tao Kong
VGen
86
13
0
26 Aug 2024
Multimodal Diffusion Transformer: Learning Versatile Behavior from
  Multimodal Goals
Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals
Moritz Reuss
Ömer Erdinç Yagmurlu
Fabian Wenzel
Rudolf Lioutikov
OffRL
98
51
0
08 Jul 2024
Learning Manipulation by Predicting Interaction
Learning Manipulation by Predicting Interaction
Jia Zeng
Qingwen Bu
Bangjun Wang
Wenke Xia
Li Chen
...
Heming Cui
Bin Zhao
Xuelong Li
Yu Qiao
Hongyang Li
107
26
0
01 Jun 2024
Imitation Learning: A Survey of Learning Methods, Environments and
  Metrics
Imitation Learning: A Survey of Learning Methods, Environments and Metrics
Nathan Gavenski
Odinaldo Rodrigues
Michael Luck
69
134
0
30 Apr 2024
VIEW: Visual Imitation Learning with Waypoints
VIEW: Visual Imitation Learning with Waypoints
Ananth Jonnavittula
Sagar Parekh
Dylan P. Losey
SSL
150
11
0
27 Apr 2024
Vid2Robot: End-to-end Video-conditioned Policy Learning with
  Cross-Attention Transformers
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
Vidhi Jain
Maria Attarian
Nikhil J. Joshi
Ayzaan Wahid
Danny Driess
...
Stefan Welker
Christine Chan
Igor Gilitschenski
Yonatan Bisk
Debidatta Dwibedi
122
32
0
19 Mar 2024
PRIME: Scaffolding Manipulation Tasks with Behavior Primitives for
  Data-Efficient Imitation Learning
PRIME: Scaffolding Manipulation Tasks with Behavior Primitives for Data-Efficient Imitation Learning
Tian Gao
Soroush Nasiriany
Huihan Liu
Quantao Yang
Yuke Zhu
99
8
0
01 Mar 2024
Unleashing Large-Scale Video Generative Pre-training for Visual Robot
  Manipulation
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation
Hongtao Wu
Ya Jing
Chi-Hou Cheang
Guangzeng Chen
Jiafeng Xu
Xinghang Li
Minghuan Liu
Hang Li
Tao Kong
123
111
0
20 Dec 2023
Visual Hindsight Self-Imitation Learning for Interactive Navigation
Visual Hindsight Self-Imitation Learning for Interactive Navigation
Kibeom Kim
Kisung Shin
Min Whoo Lee
Moonhoen Lee
Minsu Lee
Byoung-Tak Zhang
73
2
0
05 Dec 2023
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion
  Models
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models
Kevin Black
Mitsuhiko Nakamoto
P. Atreya
Homer Walke
Chelsea Finn
Aviral Kumar
Sergey Levine
DiffMLM&Ro
112
142
0
16 Oct 2023
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
  Control
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Anthony Brohan
Noah Brown
Justice Carbajal
Yevgen Chebotar
Xi Chen
...
Ted Xiao
Peng Xu
Sichun Xu
Tianhe Yu
Brianna Zitkovich
LM&RoLRM
179
1,291
0
28 Jul 2023
Language-Conditioned Imitation Learning with Base Skill Priors under
  Unstructured Data
Language-Conditioned Imitation Learning with Base Skill Priors under Unstructured Data
Hongkuan Zhou
Zhenshan Bing
Xiangtong Yao
Xiaojie Su
Chenguang Yang
Kai-Qi Huang
Alois C. Knoll
LM&Ro
71
20
0
30 May 2023
Segment Anything
Segment Anything
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLMVLM
371
7,405
0
05 Apr 2023
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Cheng Chi
Zhenjia Xu
S. Feng
Eric A. Cousineau
Yilun Du
Benjamin Burchfiel
Russ Tedrake
Shuran Song
349
1,231
0
07 Mar 2023
K-VIL: Keypoints-based Visual Imitation Learning
K-VIL: Keypoints-based Visual Imitation Learning
Jianfeng Gao
Z. Tao
Noémie Jaquier
Tamim Asfour
VGenSSL
81
26
0
07 Sep 2022
Human-to-Robot Imitation in the Wild
Human-to-Robot Imitation in the Wild
Shikhar Bahl
Abhi Gupta
Deepak Pathak
97
173
0
19 Jul 2022
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online
  Videos
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
Bowen Baker
Ilge Akkaya
Peter Zhokhov
Joost Huizinga
Jie Tang
Adrien Ecoffet
Brandon Houghton
Raul Sampedro
Jeff Clune
OffRL
132
303
0
23 Jun 2022
What Matters in Language Conditioned Robotic Imitation Learning over
  Unstructured Data
What Matters in Language Conditioned Robotic Imitation Learning over Unstructured Data
Oier Mees
Lukás Hermann
Wolfram Burgard
LM&Ro
107
155
0
13 Apr 2022
CALVIN: A Benchmark for Language-Conditioned Policy Learning for
  Long-Horizon Robot Manipulation Tasks
CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Oier Mees
Lukás Hermann
Erick Rosete-Beas
Wolfram Burgard
LM&Ro
113
263
0
06 Dec 2021
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViTTPM
477
7,827
0
11 Nov 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
978
29,871
0
26 Feb 2021
The "something something" video database for learning and evaluating
  visual common sense
The "something something" video database for learning and evaluating visual common sense
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
...
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
VLM
101
1,542
0
13 Jun 2017
Time-Contrastive Networks: Self-Supervised Learning from Video
Time-Contrastive Networks: Self-Supervised Learning from Video
P. Sermanet
Corey Lynch
Yevgen Chebotar
Jasmine Hsu
Eric Jang
S. Schaal
Sergey Levine
SSL
107
830
0
23 Apr 2017
1