ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.00905
  4. Cited By
Open-World Object Manipulation using Pre-trained Vision-Language Models

Open-World Object Manipulation using Pre-trained Vision-Language Models

2 March 2023
Austin Stone
Ted Xiao
Yao Lu
K. Gopalakrishnan
Kuang-Huei Lee
Q. Vuong
Paul Wohlhart
Sean Kirmani
Brianna Zitkovich
F. Xia
Chelsea Finn
Karol Hausman
    LM&Ro
ArXivPDFHTML

Papers citing "Open-World Object Manipulation using Pre-trained Vision-Language Models"

50 / 108 papers shown
Title
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Cunxin Fan
Xiaosong Jia
Yihang Sun
Yixiao Wang
Jianglan Wei
...
Xiangyu Zhao
M. Tomizuka
Xue Yang
Junchi Yan
Mingyu Ding
LM&Ro
VLM
69
3
0
04 May 2025
CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
Xiaoqi Li
Lingyun Xu
M. Zhang
Jiaming Liu
Yan Shen
...
Jiahui Xu
Liang Heng
Siyuan Huang
S. Zhang
Hao Dong
LM&Ro
51
0
0
04 May 2025
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Haifeng Huang
Xinyi Chen
Y. Chen
H. Li
Xiaoshen Han
Z. Wang
Tai Wang
Jiangmiao Pang
Zhou Zhao
LM&Ro
80
0
0
30 Apr 2025
$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization
π0.5π_{0.5}π0.5​: a Vision-Language-Action Model with Open-World Generalization
Physical Intelligence
Kevin Black
Noah Brown
James Darpinian
Karan Dhabalia
...
Homer Walke
Anna Walling
Haohuan Wang
Lili Yu
Ury Zhilinsky
LM&Ro
VLM
39
10
0
22 Apr 2025
FLAM: Foundation Model-Based Body Stabilization for Humanoid Locomotion and Manipulation
FLAM: Foundation Model-Based Body Stabilization for Humanoid Locomotion and Manipulation
Xianqi Zhang
Hongliang Wei
Wenrui Wang
Xingtao Wang
Xiaopeng Fan
Debin Zhao
34
0
0
28 Mar 2025
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
Xin Wen
Bingchen Zhao
Yilun Chen
Jiangmiao Pang
Xiaojuan Qi
LM&Ro
46
0
0
10 Mar 2025
BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities
Yunfan Jiang
Ruohan Zhang
J. Wong
Chen Wang
Yanjie Ze
Hang Yin
Cem Gokmen
Shuran Song
Jiajun Wu
L. Fei-Fei
67
5
0
07 Mar 2025
ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration
ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration
Minjie Zhu
Y. X. Zhu
Jinming Li
Zhongyi Zhou
Junjie Wen
Xiaoyu Liu
Chaomin Shen
Yaxin Peng
Feifei Feng
LM&Ro
86
3
0
26 Feb 2025
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models
Lucy Xiaoyang Shi
Brian Ichter
Michael Equi
Liyiming Ke
Karl Pertsch
...
Adrian Li-Bell
Danny Driess
Lachy Groom
Sergey Levine
Chelsea Finn
LM&Ro
LRM
95
7
0
26 Feb 2025
Enhancing Reusability of Learned Skills for Robot Manipulation via Gaze and Bottleneck
Enhancing Reusability of Learned Skills for Robot Manipulation via Gaze and Bottleneck
Ryo Takizawa
Izumi Karino
Koki Nakagawa
Y. Ohmura
Y. Kuniyoshi
77
1
0
25 Feb 2025
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Zekun Qi
Wenyao Zhang
Yufei Ding
Runpei Dong
Xinqiang Yu
...
Xin Jin
Kaisheng Ma
Zhizheng Zhang
He Wang
Li Yi
LM&Ro
131
4
0
18 Feb 2025
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation
Yi Li
Yuquan Deng
J. Zhang
Joel Jang
Marius Memme
...
Fabio Ramos
Dieter Fox
Anqi Li
Abhishek Gupta
Ankit Goyal
LM&Ro
99
9
0
08 Feb 2025
Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning
Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning
Juntao Ren
Priya Sundaresan
Dorsa Sadigh
Sanjiban Choudhury
Jeannette Bohg
37
14
0
13 Jan 2025
Efficient Policy Adaptation with Contrastive Prompt Ensemble for
  Embodied Agents
Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents
Wonje Choi
Woo Kyung Kim
SeungHyun Kim
Honguk Woo
74
8
0
16 Dec 2024
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics
  Manipulation
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation
Harsh Singh
Rocktim Jyoti Das
Mingfei Han
Preslav Nakov
Ivan Laptev
LM&Ro
LLMAG
76
2
0
26 Nov 2024
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Jiange Yang
Haoyi Zhu
Y. Wang
Gangshan Wu
Tong He
Limin Wang
100
2
0
21 Nov 2024
Bridging the Resource Gap: Deploying Advanced Imitation Learning Models onto Affordable Embedded Platforms
Haizhou Ge
Ruixiang Wang
Zhu-ang Xu
Hongrui Zhu
Ruichen Deng
Yuhang Dong
Zeyu Pang
Guyue Zhou
Junyu Zhang
Lu Shi
78
1
0
18 Nov 2024
STEER: Flexible Robotic Manipulation via Dense Language Grounding
STEER: Flexible Robotic Manipulation via Dense Language Grounding
Laura Smith
A. Irpan
Montserrat Gonzalez Arenas
Sean Kirmani
Dmitry Kalashnikov
Dhruv Shah
Ted Xiao
LLMSV
37
1
0
05 Nov 2024
GHIL-Glue: Hierarchical Control with Filtered Subgoal Images
GHIL-Glue: Hierarchical Control with Filtered Subgoal Images
Kyle Hatch
Ashwin Balakrishna
Oier Mees
Suraj Nair
Seohong Park
...
Masha Itkina
Benjamin Eysenbach
Sergey Levine
Thomas Kollar
Benjamin Burchfiel
62
2
0
26 Oct 2024
OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video
  Imitation
OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation
Jinhan Li
Yifeng Zhu
Yuqi Xie
Zhenyu Jiang
Mingyo Seo
Georgios Pavlakos
Yuke Zhu
LM&Ro
26
31
0
15 Oct 2024
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic
  Manipulation
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
K. Zhang
Pengzhen Ren
Bingqian Lin
Junfan Lin
Shikui Ma
Hang Xu
Xiaodan Liang
18
1
0
14 Oct 2024
BUMBLE: Unifying Reasoning and Acting with Vision-Language Models for
  Building-wide Mobile Manipulation
BUMBLE: Unifying Reasoning and Acting with Vision-Language Models for Building-wide Mobile Manipulation
Rutav Shah
Albert Yu
Yifeng Zhu
Yuke Zhu
Roberto Martín-Martín
LM&Ro
37
6
0
08 Oct 2024
LADEV: A Language-Driven Testing and Evaluation Platform for
  Vision-Language-Action Models in Robotic Manipulation
LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation
Zhijie Wang
Zhehua Zhou
Jiayang Song
Yuheng Huang
Zhan Shu
Lei Ma
26
0
0
07 Oct 2024
MLLM as Retriever: Interactively Learning Multimodal Retrieval for
  Embodied Agents
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
Junpeng Yue
Xinru Xu
Börje F. Karlsson
Zongqing Lu
36
0
0
04 Oct 2024
Run-time Observation Interventions Make Vision-Language-Action Models
  More Visually Robust
Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust
Asher Hancock
Allen Z. Ren
Anirudha Majumdar
VLM
28
2
0
02 Oct 2024
Helpful DoggyBot: Open-World Object Fetching using Legged Robots and
  Vision-Language Models
Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models
Qi Wu
Zipeng Fu
Xuxin Cheng
Xiaolong Wang
Chelsea Finn
LM&Ro
28
8
0
30 Sep 2024
RoboNurse-VLA: Robotic Scrub Nurse System based on
  Vision-Language-Action Model
RoboNurse-VLA: Robotic Scrub Nurse System based on Vision-Language-Action Model
Shunlei Li
Jin Wang
Rui Dai
Wanyu Ma
Wing Yin Ng
Yingbai Hu
Zheng Li
26
2
0
29 Sep 2024
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
V. Bhat
P. Krishnamurthy
Ramesh Karri
Farshad Khorrami
46
3
0
16 Sep 2024
HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers
HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers
Jianke Zhang
Yanjiang Guo
Xiaoyu Chen
Yen-Jen Wang
Yucheng Hu
Chengming Shi
Jianyu Chen
29
5
0
12 Sep 2024
Visual Grounding for Object-Level Generalization in Reinforcement
  Learning
Visual Grounding for Object-Level Generalization in Reinforcement Learning
Haobin Jiang
Zongqing Lu
LM&Ro
34
2
0
04 Aug 2024
Actra: Optimized Transformer Architecture for Vision-Language-Action
  Models in Robot Learning
Actra: Optimized Transformer Architecture for Vision-Language-Action Models in Robot Learning
Yueen Ma
Dafeng Chi
Shiguang Wu
Yuecheng Liu
Yuzheng Zhuang
Jianye Hao
Irwin King
36
5
0
02 Aug 2024
GET-Zero: Graph Embodiment Transformer for Zero-shot Embodiment
  Generalization
GET-Zero: Graph Embodiment Transformer for Zero-shot Embodiment Generalization
Austin Patel
Shuran Song
LM&Ro
32
3
0
20 Jul 2024
Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic
  Rewards via Failure Prompts
Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts
Yanting Yang
Minghao Chen
Qibo Qiu
Jiahao Wu
Wenxiao Wang
Binbin Lin
Ziyu Guan
Xiaofei He
LM&Ro
42
2
0
20 Jul 2024
Foundation Models for Autonomous Robots in Unstructured Environments
Foundation Models for Autonomous Robots in Unstructured Environments
Hossein Naderi
Alireza Shojaei
Lifu Huang
LM&Ro
47
0
0
19 Jul 2024
BadRobot: Jailbreaking Embodied LLMs in the Physical World
BadRobot: Jailbreaking Embodied LLMs in the Physical World
Hangtao Zhang
Chenyu Zhu
Xianlong Wang
Ziqi Zhou
Yichen Wang
...
Shengshan Hu
Leo Yu Zhang
Aishan Liu
Peijin Guo
Leo Yu Zhang
LM&Ro
50
7
0
16 Jul 2024
Robotic Control via Embodied Chain-of-Thought Reasoning
Robotic Control via Embodied Chain-of-Thought Reasoning
Michał Zawalski
William Chen
Karl Pertsch
Oier Mees
Chelsea Finn
Sergey Levine
LRM
LM&Ro
34
54
0
11 Jul 2024
Multimodal Diffusion Transformer: Learning Versatile Behavior from
  Multimodal Goals
Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals
Moritz Reuss
Ömer Erdinç Yagmurlu
Fabian Wenzel
Rudolf Lioutikov
OffRL
37
41
0
08 Jul 2024
Language-Guided Object-Centric Diffusion Policy for Generalizable and Collision-Aware Robotic Manipulation
Language-Guided Object-Centric Diffusion Policy for Generalizable and Collision-Aware Robotic Manipulation
Hang Li
Qian Feng
Zhi Zheng
Jianxiang Feng
Zhaopeng Chen
Alois Knoll
26
1
0
29 Jun 2024
MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?
MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?
Jinming Li
Yichen Zhu
Zhiyuan Xu
Jindong Gu
Minjie Zhu
Xin Liu
Ning Liu
Yaxin Peng
Feifei Feng
Jian Tang
LRM
LM&Ro
36
6
0
28 Jun 2024
RoboUniView: Visual-Language Model with Unified View Representation for
  Robotic Manipulaiton
RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulaiton
Fanfan Liu
Feng Yan
Liming Zheng
Chengjian Feng
Yiyang Huang
Lin Ma
LM&Ro
35
11
0
27 Jun 2024
Towards Open-World Grasping with Large Vision-Language Models
Towards Open-World Grasping with Large Vision-Language Models
Georgios Tziafas
H. Kasaei
LM&Ro
LRM
29
12
0
26 Jun 2024
Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with
  3D Semantic Maps
Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps
Dicong Qiu
Wenzong Ma
Zhenfu Pan
Hui Xiong
Junwei Liang
LM&Ro
34
7
0
26 Jun 2024
DKPROMPT: Domain Knowledge Prompting Vision-Language Models for
  Open-World Planning
DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning
Xiaohan Zhang
Zainab Altaweel
Yohei Hayamizu
Yan Ding
S. Amiri
Hao Yang
Andy Kaminski
Chad Esselink
Shiqi Zhang
VLM
LM&Ro
41
6
0
25 Jun 2024
Open-vocabulary Pick and Place via Patch-level Semantic Maps
Open-vocabulary Pick and Place via Patch-level Semantic Maps
Mingxi Jia
Haojie Huang
Zhewen Zhang
Chenghao Wang
Linfeng Zhao
Dian Wang
J. Liu
Robin Walters
Robert Platt
Stefanie Tellex
LM&Ro
44
5
0
21 Jun 2024
HYPERmotion: Learning Hybrid Behavior Planning for Autonomous
  Loco-manipulation
HYPERmotion: Learning Hybrid Behavior Planning for Autonomous Loco-manipulation
Jin Wang
Rui Dai
Weijie Wang
Luca Rossini
Francesco Ruscelli
Nikos Tsagarakis
39
4
0
20 Jun 2024
OpenVLA: An Open-Source Vision-Language-Action Model
OpenVLA: An Open-Source Vision-Language-Action Model
Moo Jin Kim
Karl Pertsch
Siddharth Karamcheti
Ted Xiao
Ashwin Balakrishna
...
Russ Tedrake
Dorsa Sadigh
Sergey Levine
Percy Liang
Chelsea Finn
LM&Ro
VLM
45
363
0
13 Jun 2024
Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor
  Control
Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control
Dongyoon Hwang
ByungKun Lee
Hojoon Lee
Hyunseung Kim
Jaegul Choo
53
0
0
10 Jun 2024
Vision-based Manipulation from Single Human Video with Open-World Object
  Graphs
Vision-based Manipulation from Single Human Video with Open-World Object Graphs
Yifeng Zhu
Arisrei Lim
Peter Stone
Yuke Zhu
29
33
0
30 May 2024
Recasting Generic Pretrained Vision Transformers As Object-Centric Scene
  Encoders For Manipulation Policies
Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies
Jianing Qian
Anastasios Panagopoulos
Dinesh Jayaraman
LM&Ro
ViT
38
5
0
24 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
80
42
0
23 May 2024
123
Next