ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.02044
  4. Cited By
How Physics and Background Attributes Impact Video Transformers in
  Robotic Manipulation: A Case Study on Planar Pushing
v1v2v3 (latest)

How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing

3 October 2023
Shutong Jin
Ruiyu Wang
Muhammad Zahid
Florian T. Pokorny
ArXiv (abs)PDFHTML

Papers citing "How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing"

50 / 55 papers shown
Title
CloudGripper: An Open Source Cloud Robotics Testbed for Robotic
  Manipulation Research, Benchmarking and Data Collection at Scale
CloudGripper: An Open Source Cloud Robotics Testbed for Robotic Manipulation Research, Benchmarking and Data Collection at Scale
Muhammad Zahid
Florian T. Pokorny
39
7
0
22 Sep 2023
Event-Guided Procedure Planning from Instructional Videos with Text
  Supervision
Event-Guided Procedure Planning from Instructional Videos with Text Supervision
Ante Wang
Kun-Li Channing Lin
Jiachen Du
Jingke Meng
Wei-Shi Zheng
61
16
0
17 Aug 2023
Exploring Visual Pre-training for Robot Manipulation: Datasets, Models
  and Methods
Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods
Ya Jing
Xuelin Zhu
Xingbin Liu
Qie Sima
Taozheng Yang
Yunhai Feng
Tao Kong
LM&Ro
69
16
0
07 Aug 2023
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
  Control
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Anthony Brohan
Noah Brown
Justice Carbajal
Yevgen Chebotar
Xi Chen
...
Ted Xiao
Peng Xu
Sichun Xu
Tianhe Yu
Brianna Zitkovich
LM&RoLRM
192
1,291
0
28 Jul 2023
Decomposing the Generalization Gap in Imitation Learning for Visual
  Robotic Manipulation
Decomposing the Generalization Gap in Imitation Learning for Visual Robotic Manipulation
Annie Xie
Lisa Lee
Ted Xiao
Chelsea Finn
76
62
0
07 Jul 2023
Visual Affordance Prediction for Guiding Robot Exploration
Visual Affordance Prediction for Guiding Robot Exploration
Homanga Bharadhwaj
Abhi Gupta
Shubham Tulsiani
94
15
0
28 May 2023
MimicPlay: Long-Horizon Imitation Learning by Watching Human Play
MimicPlay: Long-Horizon Imitation Learning by Watching Human Play
Chen Wang
Linxi Fan
Jiankai Sun
Ruohan Zhang
Li Fei-Fei
Danfei Xu
Yuke Zhu
Anima Anandkumar
126
198
0
24 Feb 2023
RT-1: Robotics Transformer for Real-World Control at Scale
RT-1: Robotics Transformer for Real-World Control at Scale
Anthony Brohan
Noah Brown
Justice Carbajal
Yevgen Chebotar
Joseph Dabis
...
Ted Xiao
Peng Xu
Sichun Xu
Tianhe Yu
Brianna Zitkovich
LM&Ro
138
1,159
0
13 Dec 2022
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
Dieter Fox
LM&Ro
263
501
0
12 Sep 2022
Scaling Laws vs Model Architectures: How does Inductive Bias Influence
  Scaling?
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Yi Tay
Mostafa Dehghani
Samira Abnar
Hyung Won Chung
W. Fedus
J. Rao
Sharan Narang
Vinh Q. Tran
Dani Yogatama
Donald Metzler
AI4CE
106
106
0
21 Jul 2022
MaskViT: Masked Visual Pre-Training for Video Prediction
MaskViT: Masked Visual Pre-Training for Video Prediction
Agrim Gupta
Stephen Tian
Yunzhi Zhang
Jiajun Wu
Roberto Martín-Martín
Li Fei-Fei
171
120
0
23 Jun 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with
  Weak Supervision
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
78
46
0
04 May 2022
MaxViT: Multi-Axis Vision Transformer
MaxViT: Multi-Axis Vision Transformer
Zhengzhong Tu
Hossein Talebi
Han Zhang
Feng Yang
P. Milanfar
A. Bovik
Yinxiao Li
ViT
136
670
0
04 Apr 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
230
1,206
0
23 Mar 2022
Transframer: Arbitrary Frame Prediction with Generative Models
Transframer: Arbitrary Frame Prediction with Generative Models
C. Nash
João Carreira
Jacob Walker
Iain Barr
Andrew Jaegle
Mateusz Malinowski
Peter W. Battaglia
ViT
90
38
0
17 Mar 2022
6-DoF Pose Estimation of Household Objects for Robotic Manipulation: An
  Accessible Dataset and Benchmark
6-DoF Pose Estimation of Household Objects for Robotic Manipulation: An Accessible Dataset and Benchmark
Stephen Tyree
Jonathan Tremblay
Thang To
Jia Cheng
Terry Mosier
Jeffrey Smith
Stan Birchfield
85
98
0
11 Mar 2022
Online Decision Transformer
Online Decision Transformer
Qinqing Zheng
Amy Zhang
Aditya Grover
OffRL
78
209
0
11 Feb 2022
BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning
BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning
Eric Jang
A. Irpan
Mohi Khansari
Daniel Kappler
F. Ebert
Corey Lynch
Sergey Levine
Chelsea Finn
LM&Ro
263
550
0
04 Feb 2022
Error-Aware Imitation Learning from Teleoperation Data for Mobile
  Manipulation
Error-Aware Imitation Learning from Teleoperation Data for Mobile Manipulation
J. Wong
Albert Tung
Andrey Kurenkov
Ajay Mandlekar
Li Fei-Fei
Silvio Savarese
Roberto Martín-Martín
70
58
0
09 Dec 2021
SwinTrack: A Simple and Strong Baseline for Transformer Tracking
SwinTrack: A Simple and Strong Baseline for Transformer Tracking
Liting Lin
Heng Fan
Zhipeng Zhang
Yong-mei Xu
Haibin Ling
ViT
90
314
0
02 Dec 2021
CLIPort: What and Where Pathways for Robotic Manipulation
CLIPort: What and Where Pathways for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
Dieter Fox
LM&Ro
123
661
0
24 Sep 2021
Scale Efficiently: Insights from Pre-training and Fine-tuning
  Transformers
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Yi Tay
Mostafa Dehghani
J. Rao
W. Fedus
Samira Abnar
Hyung Won Chung
Sharan Narang
Dani Yogatama
Ashish Vaswani
Donald Metzler
265
115
0
22 Sep 2021
PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks
PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks
Jiankai Sun
De-An Huang
Bo Lu
Yunhui Liu
Bolei Zhou
Animesh Garg
56
56
0
10 Sep 2021
What Matters in Learning from Offline Human Demonstrations for Robot
  Manipulation
What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
Ajay Mandlekar
Danfei Xu
J. Wong
Soroush Nasiriany
Chen Wang
Rohun Kulkarni
Li Fei-Fei
Silvio Savarese
Yuke Zhu
Roberto Martín-Martín
OffRL
289
519
0
06 Aug 2021
Video Swin Transformer
Video Swin Transformer
Ze Liu
Jia Ning
Yue Cao
Yixuan Wei
Zheng Zhang
Stephen Lin
Han Hu
ViT
113
1,489
0
24 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
86
280
0
09 Jun 2021
Decision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence Modeling
Lili Chen
Kevin Lu
Aravind Rajeswaran
Kimin Lee
Aditya Grover
Michael Laskin
Pieter Abbeel
A. Srinivas
Igor Mordatch
OffRL
151
1,660
0
02 Jun 2021
ViViT: A Video Vision Transformer
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
225
2,168
0
29 Mar 2021
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image
  Classification
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
Chun-Fu Chen
Quanfu Fan
Yikang Shen
ViT
71
1,484
0
27 Mar 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
467
21,603
0
25 Mar 2021
Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual
  Tracking
Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
Ning Wang
Wen-gang Zhou
Jie Wang
Houqiang Li
ViT
78
534
0
22 Mar 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
535
3,740
0
24 Feb 2021
Video Transformer Network
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
276
433
0
01 Feb 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple
  and Efficient Sparsity
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
88
2,226
0
11 Jan 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
307
2,532
0
04 Jan 2021
Transformers for One-Shot Visual Imitation
Transformers for One-Shot Visual Imitation
Sudeep Dasari
Abhinav Gupta
LM&Ro
90
95
0
11 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
679
41,483
0
22 Oct 2020
AViD Dataset: Anonymized Videos from Diverse Countries
AViD Dataset: Anonymized Videos from Diverse Countries
A. Piergiovanni
Michael S. Ryoo
116
36
0
10 Jul 2020
Rescaling Egocentric Vision
Rescaling Egocentric Vision
Dima Damen
Hazel Doughty
G. Farinella
Antonino Furnari
Evangelos Kazakos
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
112
466
0
23 Jun 2020
Latent Video Transformer
Latent Video Transformer
Ruslan Rakhimov
Denis Volkhonskiy
Alexey Artemov
Denis Zorin
Evgeny Burnaev
VGen
100
121
0
18 Jun 2020
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
Justin Fu
Aviral Kumar
Ofir Nachum
George Tucker
Sergey Levine
GPOffRL
237
1,384
0
15 Apr 2020
Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALMVLM
185
4,100
0
10 Apr 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
638
4,921
0
23 Jan 2020
Leveraging Procedural Generation to Benchmark Reinforcement Learning
Leveraging Procedural Generation to Benchmark Reinforcement Learning
K. Cobbe
Christopher Hesse
Jacob Hilton
John Schulman
87
557
0
03 Dec 2019
RLBench: The Robot Learning Benchmark & Learning Environment
RLBench: The Robot Learning Benchmark & Learning Environment
Stephen James
Z. Ma
David Rovick Arrojo
Andrew J. Davison
SSLVLMOffRL
109
563
0
26 Sep 2019
RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through
  Imitation
RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation
Mehdi Letafati
Yuke Zhu
Animesh Garg
Jonathan Booher
Max Spero
...
John Emmons
Anchit Gupta
Emre Orbay
Silvio Savarese
Li Fei-Fei
OffRL
95
293
0
07 Nov 2018
GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in
  the Wild
GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild
Lianghua Huang
Xin Zhao
Kaiqi Huang
118
1,349
0
29 Oct 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,229
0
11 Oct 2018
Charades-Ego: A Large-Scale Dataset of Paired Third and First Person
  Videos
Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos
Gunnar Sigurdsson
Abhinav Gupta
Cordelia Schmid
Ali Farhadi
Alahari Karteek
SLREgoV
70
171
0
25 Apr 2018
Moments in Time Dataset: one million videos for event understanding
Moments in Time Dataset: one million videos for event understanding
Mathew Monfort
A. Andonian
Bolei Zhou
K. Ramakrishnan
Sarah Adel Bargal
...
L. Brown
Quanfu Fan
Dan Gutfreund
Carl Vondrick
A. Oliva
98
551
0
09 Jan 2018
12
Next