ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.13142
  4. Cited By
Pre-training Auto-regressive Robotic Models with 4D Representations

Pre-training Auto-regressive Robotic Models with 4D Representations

18 February 2025
Dantong Niu
Yuvan Sharma
Haoru Xue
Giscard Biamby
Junyi Zhang
Ziteng Ji
Trevor Darrell
Roei Herzig
ArXivPDFHTML

Papers citing "Pre-training Auto-regressive Robotic Models with 4D Representations"

39 / 39 papers shown
Title
FAST: Efficient Action Tokenization for Vision-Language-Action Models
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Karl Pertsch
Kyle Stachowicz
Brian Ichter
Danny Driess
Suraj Nair
Q. Vuong
Oier Mees
Chelsea Finn
Sergey Levine
59
43
0
17 Jan 2025
In-Context Learning Enables Robot Action Prediction in LLMs
In-Context Learning Enables Robot Action Prediction in LLMs
Yida Yin
Zekai Wang
Yuvan Sharma
Dantong Niu
Trevor Darrell
Roei Herzig
LM&Ro
154
4
0
16 Oct 2024
Latent Action Pretraining from Videos
Latent Action Pretraining from Videos
Seonghyeon Ye
Joel Jang
Byeongguk Jeon
Sejune Joo
Jianwei Yang
...
Kimin Lee
J. Gao
Luke Zettlemoyer
Dieter Fox
Minjoon Seo
56
34
0
15 Oct 2024
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
Junyi Zhang
Charles Herrmann
Junhwa Hur
Varun Jampani
Trevor Darrell
Forrester Cole
Deqing Sun
Ming-Hsuan Yang
VGen
120
75
0
04 Oct 2024
Flow as the Cross-Domain Manipulation Interface
Flow as the Cross-Domain Manipulation Interface
Mengda Xu
Zhenjia Xu
Yinghao Xu
Cheng Chi
Gordon Wetzstein
Manuela Veloso
Shuran Song
AI4CE
54
40
0
21 Jul 2024
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Xiang Li
Cristina Mata
J. Park
Kumara Kahatapitiya
Yoo Sung Jang
...
Kanchana Ranasinghe
R. Burgert
Mu Cai
Yong Jae Lee
Michael S. Ryoo
LM&Ro
91
26
0
28 Jun 2024
RVT-2: Learning Precise Manipulation from Few Demonstrations
RVT-2: Learning Precise Manipulation from Few Demonstrations
Ankit Goyal
Valts Blukis
Jie Xu
Yijie Guo
Yu-Wei Chao
Dieter Fox
42
44
0
12 Jun 2024
Octo: An Open-Source Generalist Robot Policy
Octo: An Open-Source Generalist Robot Policy
Octo Model Team
Dibya Ghosh
Homer Walke
Karl Pertsch
Kevin Black
...
Quan Vuong
Ted Xiao
Dorsa Sadigh
Chelsea Finn
Sergey Levine
122
392
0
20 May 2024
SpatialTracker: Tracking Any 2D Pixels in 3D Space
SpatialTracker: Tracking Any 2D Pixels in 3D Space
Yuxi Xiao
Qianqian Wang
Shangzhan Zhang
Nan Xue
Sida Peng
Yujun Shen
Xiaowei Zhou
69
54
0
05 Apr 2024
3D-VLA: A 3D Vision-Language-Action Generative World Model
3D-VLA: A 3D Vision-Language-Action Generative World Model
Haoyu Zhen
Xiaowen Qiu
Peihao Chen
Jincheng Yang
Xin Yan
Yilun Du
Yining Hong
Chuang Gan
LM&Ro
VGen
PINN
57
94
0
14 Mar 2024
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic
  Manipulation
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
Guanxing Lu
Shiyi Zhang
Ziwei Wang
Changliu Liu
Jiwen Lu
Yansong Tang
73
54
0
13 Mar 2024
RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory
  Sketches
RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches
Jiayuan Gu
Sean Kirmani
Paul Wohlhart
Yao Lu
Montse Gonzalez Arenas
...
Hao Su
Karol Hausman
Chelsea Finn
Q. Vuong
Ted Xiao
40
68
0
03 Nov 2023
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
  Control
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Anthony Brohan
Noah Brown
Justice Carbajal
Yevgen Chebotar
Xi Chen
...
Ted Xiao
Peng Xu
Sichun Xu
Tianhe Yu
Brianna Zitkovich
LM&Ro
LRM
77
1,172
0
28 Jul 2023
CoTracker: It is Better to Track Together
CoTracker: It is Better to Track Together
Nikita Karaev
Ignacio Rocco
Benjamin Graham
Natalia Neverova
Andrea Vedaldi
Christian Rupprecht
VOT
ViT
68
252
0
14 Jul 2023
RVT: Robotic View Transformer for 3D Object Manipulation
RVT: Robotic View Transformer for 3D Object Manipulation
Ankit Goyal
Jie Xu
Yijie Guo
Valts Blukis
Yu-Wei Chao
Dieter Fox
LM&Ro
64
128
0
26 Jun 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
329
4,506
0
17 Apr 2023
Segment Anything
Segment Anything
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
238
7,047
0
05 Apr 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
410
13,788
0
15 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
593
12,840
0
27 Feb 2023
ToolFlowNet: Robotic Manipulation with Tools via Predicting Tool Flow
  from Point Clouds
ToolFlowNet: Robotic Manipulation with Tools via Predicting Tool Flow from Point Clouds
Daniel Seita
Yufei Wang
Sarthak J. Shetty
Edward Li
Zackory M. Erickson
David Held
3DPC
51
49
0
16 Nov 2022
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
139
3,072
0
20 Oct 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLM
VLM
58
694
0
14 Sep 2022
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
Dieter Fox
LM&Ro
218
477
0
12 Sep 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
273
3,458
0
29 Apr 2022
Particle Video Revisited: Tracking Through Occlusions Using Point
  Trajectories
Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories
Adam W. Harley
Zhaoyuan Fang
Katerina Fragkiadaki
49
161
0
08 Apr 2022
Masked Visual Pre-training for Motor Control
Masked Visual Pre-training for Motor Control
Tete Xiao
Ilija Radosavovic
Trevor Darrell
Jitendra Malik
SSL
64
243
0
11 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
672
12,525
0
04 Mar 2022
IFOR: Iterative Flow Minimization for Robotic Object Rearrangement
IFOR: Iterative Flow Minimization for Robotic Object Rearrangement
Ankit Goyal
Arsalan Mousavian
Chris Paxton
Yu-Wei Chao
Brian Okorn
Jia Deng
Dieter Fox
74
56
0
01 Feb 2022
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
173
1,398
0
03 Nov 2021
Finetuned Language Models Are Zero-Shot Learners
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
59
3,678
0
03 Sep 2021
Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic
  Manipulation via Discretisation
Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via Discretisation
Stephen James
Kentaro Wada
Tristan Laidlow
Andrew J. Davison
48
125
0
23 Jun 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
656
28,659
0
26 Feb 2021
RAFT-3D: Scene Flow using Rigid-Motion Embeddings
RAFT-3D: Scene Flow using Rigid-Motion Embeddings
Zachary Teed
Jia Deng
VGen
3DPC
41
130
0
01 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
307
40,217
0
22 Oct 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
453
41,106
0
28 May 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
211
42,038
0
03 Dec 2019
RLBench: The Robot Learning Benchmark & Learning Environment
RLBench: The Robot Learning Benchmark & Learning Environment
Stephen James
Z. Ma
David Rovick Arrojo
Andrew J. Davison
SSL
VLM
OffRL
87
537
0
26 Sep 2019
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
Dima Damen
Hazel Doughty
G. Farinella
Sanja Fidler
Antonino Furnari
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
65
1,011
0
08 Apr 2018
The "something something" video database for learning and evaluating
  visual common sense
The "something something" video database for learning and evaluating visual common sense
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
...
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
VLM
66
1,507
0
13 Jun 2017
1