ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.11917
  4. Cited By
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning

OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning

17 May 2025
Fanqi Lin
Ruiqian Nai
Yingdong Hu
Jiacheng You
Junming Zhao
Yang Gao
    LRM
ArXivPDFHTML

Papers citing "OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning"

50 / 59 papers shown
Title
$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization
π0.5π_{0.5}π0.5​: a Vision-Language-Action Model with Open-World Generalization
Physical Intelligence
Kevin Black
Noah Brown
James Darpinian
Karan Dhabalia
...
Homer Walke
Anna Walling
Haohuan Wang
Lili Yu
Ury Zhilinsky
LM&Ro
VLM
95
32
0
22 Apr 2025
Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation
Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation
Abhiram Maddukuri
Z. L. Jiang
Lawrence Yunliang Chen
Soroush Nasiriany
Yuqi Xie
...
Scott Reed
Ken Goldberg
Ajay Mandlekar
Linxi Fan
Yuke Zhu
104
6
0
31 Mar 2025
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
Qingqing Zhao
Yao Lu
Moo Jin Kim
Zipeng Fu
Zhuoyang Zhang
...
Ankur Handa
Xuan Li
Donglai Xiang
Gordon Wetzstein
Nayeon Lee
LM&Ro
LRM
73
23
0
27 Mar 2025
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Nvidia
Johan Bjorck
Fernando Castañeda
Nikita Cherniadev
Xingye Da
...
Ao Zhang
Hao Zhang
Yizhou Zhao
Ruijie Zheng
Yuke Zhu
VLM
124
48
0
18 Mar 2025
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
Huang Huang
Fangchen Liu
Letian Fu
Tingfan Wu
Mustafa Mukadam
Jitendra Malik
Ken Goldberg
Pieter Abbeel
LM&Ro
VLM
127
9
0
05 Mar 2025
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models
Lucy Xiaoyang Shi
Brian Ichter
Michael Equi
Liyiming Ke
Karl Pertsch
...
Adrian Li-Bell
Danny Driess
Lachy Groom
Sergey Levine
Chelsea Finn
LM&Ro
LRM
115
16
0
26 Feb 2025
ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration
ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration
Minjie Zhu
Yinlin Zhu
Jinming Li
Zhongyi Zhou
Junjie Wen
Xiaoyu Liu
Yaxin Peng
Chaomin Shen
Feifei Feng
LM&Ro
128
5
0
26 Feb 2025
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control
Junjie Wen
Yinlin Zhu
Jinming Li
Zhibin Tang
Yaxin Peng
Feifei Feng
VLM
93
21
0
09 Feb 2025
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation
Yi Li
Yuquan Deng
Jing Zhang
Joel Jang
Marius Memme
...
Fabio Ramos
Dieter Fox
Anqi Li
Abhishek Gupta
Ankit Goyal
LM&Ro
133
14
0
08 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
318
1,611
0
22 Jan 2025
FAST: Efficient Action Tokenization for Vision-Language-Action Models
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Karl Pertsch
Kyle Stachowicz
Brian Ichter
Danny Driess
Suraj Nair
Q. Vuong
Oier Mees
Chelsea Finn
Sergey Levine
114
55
0
17 Jan 2025
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
V. Bhat
Prashanth Krishnamurthy
Ramesh Karri
Farshad Khorrami
90
5
0
16 Sep 2024
Automating Robot Failure Recovery Using Vision-Language Models With
  Optimized Prompts
Automating Robot Failure Recovery Using Vision-Language Models With Optimized Prompts
Hongyi Chen
Yunchao Yao
Ruixuan Liu
Changliu Liu
Jeffrey Ichnowski
53
10
0
06 Sep 2024
Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning
Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning
Joey Hejna
Chethan Bhateja
Yichen Jian
Karl Pertsch
Dorsa Sadigh
75
19
0
26 Aug 2024
Scaling Cross-Embodied Learning: One Policy for Manipulation,
  Navigation, Locomotion and Aviation
Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation
Ria Doshi
Homer Walke
Oier Mees
Sudeep Dasari
Sergey Levine
99
54
0
21 Aug 2024
Transfusion: Predict the Next Token and Diffuse Images with One
  Multi-Modal Model
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou
Lili Yu
Arun Babu
Kushal Tirumala
Michihiro Yasunaga
Leonid Shamis
Jacob Kahn
Xuezhe Ma
Luke Zettlemoyer
Omer Levy
DiffM
89
176
0
20 Aug 2024
Robotic Control via Embodied Chain-of-Thought Reasoning
Robotic Control via Embodied Chain-of-Thought Reasoning
Michał Zawalski
William Chen
Karl Pertsch
Oier Mees
Chelsea Finn
Sergey Levine
LRM
LM&Ro
110
75
0
11 Jul 2024
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for
  Robotics
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics
Wentao Yuan
Jiafei Duan
Valts Blukis
Wilbert Pumacay
Ranjay Krishna
Adithyavairavan Murali
Arsalan Mousavian
Dieter Fox
LM&Ro
76
61
0
15 Jun 2024
OpenVLA: An Open-Source Vision-Language-Action Model
OpenVLA: An Open-Source Vision-Language-Action Model
Moo Jin Kim
Karl Pertsch
Siddharth Karamcheti
Ted Xiao
Ashwin Balakrishna
...
Russ Tedrake
Dorsa Sadigh
Sergey Levine
Percy Liang
Chelsea Finn
LM&Ro
VLM
205
464
0
13 Jun 2024
Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V
Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V
Peiyuan Zhi
Zhiyuan Zhang
Muzhi Han
Zeyu Zhang
Zhitian Li
Ziyuan Jiao
Ziyuan Jiao
Siyuan Huang
Siyuan Huang
LRM
LM&Ro
73
32
0
16 Apr 2024
Yell At Your Robot: Improving On-the-Fly from Language Corrections
Yell At Your Robot: Improving On-the-Fly from Language Corrections
Lucy Xiaoyang Shi
Zheyuan Hu
Tony Zhao
Archit Sharma
Karl Pertsch
Jianlan Luo
Sergey Levine
Chelsea Finn
LM&Ro
105
68
0
19 Mar 2024
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Alexander Khazatsky
Karl Pertsch
Suraj Nair
Ashwin Balakrishna
Sudeep Dasari
...
Thomas Kollar
Sergey Levine
Chelsea Finn
Sergey Levine
Chelsea Finn
184
203
0
19 Mar 2024
CoPa: General Robotic Manipulation through Spatial Constraints of Parts
  with Foundation Models
CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundation Models
Haoxu Huang
Fanqi Lin
Yingdong Hu
Shengjie Wang
Yang Gao
84
57
0
13 Mar 2024
DeepSeek-VL: Towards Real-World Vision-Language Understanding
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Haoyu Lu
Wen Liu
Bo Zhang
Bing-Li Wang
Kai Dong
...
Yaofeng Sun
Chengqi Deng
Hanwei Xu
Zhenda Xie
Chong Ruan
VLM
73
348
0
08 Mar 2024
RT-H: Action Hierarchies Using Language
RT-H: Action Hierarchies Using Language
Suneel Belkhale
Tianli Ding
Ted Xiao
P. Sermanet
Quon Vuong
Jonathan Tompson
Yevgen Chebotar
Debidatta Dwibedi
Dorsa Sadigh
LM&Ro
78
85
0
04 Mar 2024
Pushing the Limits of Cross-Embodiment Learning for Manipulation and
  Navigation
Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation
Jonathan Yang
Catherine Glossop
Arjun Bhorkar
Dhruv Shah
Quan Vuong
Chelsea Finn
Dorsa Sadigh
Sergey Levine
73
45
0
29 Feb 2024
Universal Manipulation Interface: In-The-Wild Robot Teaching Without
  In-The-Wild Robots
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots
Cheng Chi
Zhenjia Xu
Chuer Pan
Eric A. Cousineau
Benjamin Burchfiel
Siyuan Feng
Russ Tedrake
Shuran Song
66
213
0
15 Feb 2024
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned
  Language Models
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
Siddharth Karamcheti
Suraj Nair
Ashwin Balakrishna
Percy Liang
Thomas Kollar
Dorsa Sadigh
MLLM
VLM
89
119
0
12 Feb 2024
OK-Robot: What Really Matters in Integrating Open-Knowledge Models for
  Robotics
OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics
Peiqi Liu
Yaswanth Orru
Jay Vakil
Chris Paxton
Nur Muhammad (Mahi) Shafiullah
Lerrel Pinto
LM&Ro
VLM
112
27
0
22 Jan 2024
Look Before You Leap: Unveiling the Power of GPT-4V in Robotic
  Vision-Language Planning
Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning
Yingdong Hu
Fanqi Lin
Tong Zhang
Li Yi
Yang Gao
LM&Ro
113
115
0
29 Nov 2023
Vision-Language Foundation Models as Effective Robot Imitators
Vision-Language Foundation Models as Effective Robot Imitators
Xinghang Li
Minghuan Liu
Hanbo Zhang
Cunjun Yu
Jie Xu
...
Ya Jing
Weinan Zhang
Huaping Liu
Hang Li
Tao Kong
LM&Ro
63
153
0
02 Nov 2023
Interactive Task Planning with Language Models
Interactive Task Planning with Language Models
Boyi Li
Philipp Wu
Pieter Abbeel
Jitendra Malik
LM&Ro
77
38
0
16 Oct 2023
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Xi Chen
Xiao Wang
Lucas Beyer
Alexander Kolesnikov
Jialin Wu
...
Keran Rong
Tianli Yu
Daniel Keysers
Xiao-Qi Zhai
Radu Soricut
MLLM
VLM
90
97
0
13 Oct 2023
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment Collaboration
Abby OÑeill
Abdul Rehman
Abhinav Gupta
Abhiram Maddukuri
...
Zhuo Xu
Zichen Jeff Cui
Zichen Zhang
Zipeng Fu
Zipeng Lin
LM&Ro
138
499
0
13 Oct 2023
Improved Baselines with Visual Instruction Tuning
Improved Baselines with Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLM
MLLM
108
2,672
0
05 Oct 2023
BridgeData V2: A Dataset for Robot Learning at Scale
BridgeData V2: A Dataset for Robot Learning at Scale
Homer Walke
Kevin Black
Abraham Lee
Moo Jin Kim
Maximilian Du
...
Andre Wang He
Vivek Myers
Kuan Fang
Chelsea Finn
Sergey Levine
64
227
0
24 Aug 2023
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
  Control
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Anthony Brohan
Noah Brown
Justice Carbajal
Yevgen Chebotar
Xi Chen
...
Ted Xiao
Peng Xu
Sichun Xu
Tianhe Yu
Brianna Zitkovich
LM&Ro
LRM
110
1,217
0
28 Jul 2023
GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic
  Manipulation
GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation
Junghyun Kim
Gi-Cheon Kang
Jaein Kim
Suyeon Shin
Byoung-Tak Zhang
LM&Ro
53
7
0
12 Jul 2023
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
Yao Mu
Qinglong Zhang
Mengkang Hu
Wen Wang
Mingyu Ding
Jun Jin
Bin Wang
Jifeng Dai
Yu Qiao
Ping Luo
LM&Ro
LRM
55
234
0
24 May 2023
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
Tony Zhao
Vikash Kumar
Sergey Levine
Chelsea Finn
65
612
0
23 Apr 2023
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Cheng Chi
Zhenjia Xu
S. Feng
Eric A. Cousineau
Yilun Du
Benjamin Burchfiel
Russ Tedrake
Shuran Song
324
1,170
0
07 Mar 2023
PaLM-E: An Embodied Multimodal Language Model
PaLM-E: An Embodied Multimodal Language Model
Danny Driess
F. Xia
Mehdi S. M. Sajjadi
Corey Lynch
Aakanksha Chowdhery
...
Marc Toussaint
Klaus Greff
Andy Zeng
Igor Mordatch
Peter R. Florence
LM&Ro
83
1,629
0
06 Mar 2023
Open-World Object Manipulation using Pre-trained Vision-Language Models
Open-World Object Manipulation using Pre-trained Vision-Language Models
Austin Stone
Ted Xiao
Yao Lu
K. Gopalakrishnan
Kuang-Huei Lee
...
Sean Kirmani
Brianna Zitkovich
F. Xia
Chelsea Finn
Karol Hausman
LM&Ro
219
149
0
02 Mar 2023
Grounded Decoding: Guiding Text Generation with Grounded Models for
  Embodied Agents
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents
Wenlong Huang
Fei Xia
Dhruv Shah
Danny Driess
Andy Zeng
...
Pete Florence
Igor Mordatch
Sergey Levine
Karol Hausman
Brian Ichter
LM&Ro
67
48
0
01 Mar 2023
RT-1: Robotics Transformer for Real-World Control at Scale
RT-1: Robotics Transformer for Real-World Control at Scale
Anthony Brohan
Noah Brown
Justice Carbajal
Yevgen Chebotar
Joseph Dabis
...
Ted Xiao
Peng Xu
Sichun Xu
Tianhe Yu
Brianna Zitkovich
LM&Ro
76
1,099
0
13 Dec 2022
Flow Matching for Generative Modeling
Flow Matching for Generative Modeling
Y. Lipman
Ricky T. Q. Chen
Heli Ben-Hamu
Maximilian Nickel
Matt Le
OOD
177
1,274
0
06 Oct 2022
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
393
2,812
0
06 Oct 2022
Rectified Flow: A Marginal Preserving Approach to Optimal Transport
Rectified Flow: A Marginal Preserving Approach to Optimal Transport
Qiang Liu
OT
160
101
0
29 Sep 2022
GLSO: Grammar-guided Latent Space Optimization for Sample-efficient
  Robot Design Automation
GLSO: Grammar-guided Latent Space Optimization for Sample-efficient Robot Design Automation
Jiaheng Hu
Julian Whiman
Howie Choset
74
16
0
23 Sep 2022
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Michael Ahn
Anthony Brohan
Noah Brown
Yevgen Chebotar
Omar Cortes
...
Ted Xiao
Peng Xu
Sichun Xu
Mengyuan Yan
Andy Zeng
LM&Ro
159
1,946
0
04 Apr 2022
12
Next