Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.03378
Cited By
PaLM-E: An Embodied Multimodal Language Model
6 March 2023
Danny Driess
F. Xia
Mehdi S. M. Sajjadi
Corey Lynch
Aakanksha Chowdhery
Brian Ichter
Ayzaan Wahid
Jonathan Tompson
Q. Vuong
Tianhe Yu
Wenlong Huang
Yevgen Chebotar
P. Sermanet
Daniel Duckworth
Sergey Levine
Vincent Vanhoucke
Karol Hausman
Marc Toussaint
Klaus Greff
Andy Zeng
Igor Mordatch
Peter R. Florence
LM&Ro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PaLM-E: An Embodied Multimodal Language Model"
50 / 139 papers shown
Title
One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration
Jinbang Huang
Yixin Xiao
Zhanguang Zhang
Mark Coates
Jianye Hao
Yingxue Zhang
LM&Ro
LRM
41
0
0
23 May 2025
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Yang Liu
Ming Ma
Xiaomin Yu
Pengxiang Ding
Han Zhao
Mingyang Sun
Siteng Huang
Donglin Wang
LRM
116
0
0
18 May 2025
Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition
Bo Yue
Shuqi Guo
Kaiyu Hu
Chujiao Wang
Benyou Wang
Kui Jia
Guiliang Liu
LRM
73
0
0
16 May 2025
Adaptive Wiping: Adaptive contact-rich manipulation through few-shot imitation learning with Force-Torque feedback and pre-trained object representations
Chikaha Tsuji
Enrique Coronado
Pablo Osorio
G. Venture
109
0
0
09 May 2025
Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection
SungHeon Jeong
Jihong Park
Mohsen Imani
102
0
0
05 May 2025
Robotic Visual Instruction
Yuchen Li
Ziyang Gong
Haoyang Li
Xiaoqi Huang
Haolan Kang
Guangping Bai
Xianzheng Ma
LM&Ro
108
0
0
01 May 2025
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
Yiming Du
Wenyu Huang
Danna Zheng
Zhaowei Wang
Sébastien Montella
Mirella Lapata
Kam-Fai Wong
Jeff Z. Pan
KELM
MU
154
3
0
01 May 2025
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Aniket Didolkar
Andrii Zadaianchuk
Rabiul Awal
Maximilian Seitzer
E. Gavves
Aishwarya Agrawal
OCL
VLM
143
3
0
27 Mar 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Weinan Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Yueting Zhuang
LM&Ro
LRM
123
7
0
27 Mar 2025
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Nvidia
A. Azzolini
Junjie Bai
Prithvijit Chattopadhyay
Huayu Chen
...
Xiaodong Yang
Zhuolin Yang
Jing Zhang
Xiaohui Zeng
Zhe Zhang
AI4CE
LM&Ro
LRM
120
10
0
18 Mar 2025
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Nvidia
Johan Bjorck
Fernando Castañeda
Nikita Cherniadev
Xingye Da
...
Ao Zhang
Hao Zhang
Yizhou Zhao
Ruijie Zheng
Yuke Zhu
VLM
122
37
0
18 Mar 2025
EscapeCraft: A 3D Room Escape Environment for Benchmarking Complex Multimodal Reasoning Ability
Zehua Wang
Yurui Dong
Ziyue Wang
Minyuan Ruan
Zhili Cheng
Chong Chen
Ziwei Sun
Yang Liu
LRM
108
1
0
13 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
100
1
0
13 Mar 2025
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
Jiaming Liu
Hao Chen
Pengju An
Zhuoyang Liu
Renrui Zhang
...
Chengkai Hou
Mengdi Zhao
KC alex Zhou
Pheng-Ann Heng
Shanghang Zhang
118
14
0
13 Mar 2025
Attention Hijackers: Detect and Disentangle Attention Hijacking in LVLMs for Hallucination Mitigation
Beitao Chen
Xinyu Lyu
Lianli Gao
Jingkuan Song
Jikang Cheng
115
1
0
11 Mar 2025
EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments
Dongping Li
Tielong Cai
Tianci Tang
Wenhao Chai
Katherine Rose Driggs-Campbell
Gaoang Wang
LM&Ro
143
0
0
11 Mar 2025
Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment
Xing Xie
Jiawei Liu
Ziyue Lin
Huijie Fan
Zhi Han
Yandong Tang
Liangqiong Qu
82
0
0
10 Mar 2025
Generative Artificial Intelligence in Robotic Manipulation: A Survey
Kun Zhang
Peng Yun
Jun Cen
Junhao Cai
DiDi Zhu
...
Qifeng Chen
Jia Pan
Wei Zhang
Bo Yang
Hua Chen
124
1
0
05 Mar 2025
Knowledge Bridger: Towards Training-free Missing Multi-modality Completion
Guanzhou Ke
Shengfeng He
Xinyu Wang
Bo Wang
Guoqing Chao
Yize Zhang
Yi Xie
HeXing Su
125
0
0
27 Feb 2025
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Zhaoyi Liu
Huan Zhang
AAML
119
1
0
25 Feb 2025
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
Taiyi Wang
Zhihao Wu
Jianheng Liu
Jianye Hao
Jun Wang
Kun Shao
OffRL
74
20
0
24 Feb 2025
Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
Zichen Wen
Yifeng Gao
Weijia Li
Conghui He
Linfeng Zhang
LRM
99
0
0
17 Feb 2025
SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models
Yi Wu
Z. Xiong
Yiran Hu
Shreyash S. Iyengar
Nan Jiang
Aniket Bera
Lin Tan
Suresh Jagannathan
LM&Ro
LLMAG
117
4
0
17 Feb 2025
AgentStudio: A Toolkit for Building General Virtual Agents
Longtao Zheng
Zhiyuan Huang
Zhenghai Xue
Xinrun Wang
Bo An
Shuicheng Yan
112
15
0
17 Feb 2025
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation
Yi Li
Yuquan Deng
Jing Zhang
Joel Jang
Marius Memme
...
Fabio Ramos
Dieter Fox
Anqi Li
Abhishek Gupta
Ankit Goyal
LM&Ro
127
12
0
08 Feb 2025
Importance Sampling via Score-based Generative Models
Heasung Kim
Taekyun Lee
Hyeji Kim
Gustavo de Veciana
MedIm
DiffM
166
2
0
07 Feb 2025
AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs
Hongxin Li
Jingfan Chen
Jingran Su
Yuntao Chen
Qing Li
Zhaoxiang Zhang
381
1
0
04 Feb 2025
Boosting Multimodal Reasoning with Automated Structured Thinking
Jinyang Wu
Mingkuan Feng
Shuai Zhang
Ruihan Jin
Feihu Che
Zengqi Wen
J. Tao
Jianhua Tao
LRM
145
11
0
04 Feb 2025
Hypo3D: Exploring Hypothetical Reasoning in 3D
Ye Mao
Weixun Luo
Junpeng Jing
Anlan Qiu
K. Mikolajczyk
129
0
0
02 Feb 2025
PixelWorld: Towards Perceiving Everything as Pixels
Zhiheng Lyu
Xueguang Ma
Wenhu Chen
184
1
0
31 Jan 2025
Learn from the Past: Language-conditioned Object Rearrangement with Large Language Models
Guanqun Cao
Ryan Mckenna
Erich Graf
John Oyekan
LM&Ro
149
0
0
30 Jan 2025
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Kei Katsumata
Motonari Kambara
Daichi Yashima
Ryosuke Korekata
Komei Sugiura
145
0
0
28 Jan 2025
DriveLM: Driving with Graph Visual Question Answering
Chonghao Sima
Katrin Renz
Kashyap Chitta
Lawrence Yunliang Chen
Hanxue Zhang
Chengen Xie
Jens Beißwenger
Ping Luo
Andreas Geiger
Hongyang Li
167
187
0
17 Jan 2025
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
Tian Jin
Yuxiao Luo
Yue Ma
Yu Qiao
Yali Wang
Mamba
92
1
0
08 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
93
26
0
31 Dec 2024
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
Kun Wu
Chengkai Hou
Jiaming Liu
Zhengping Che
Xiaozhu Ju
...
Zhenyu Wang
Pengju An
Siyuan Qian
Shanghang Zhang
Jian Tang
LM&Ro
171
19
0
18 Dec 2024
Robust Contact-rich Manipulation through Implicit Motor Adaptation
Teng Xue
Amirreza Razmjoo
Suhan Shetty
Sylvain Calinon
132
1
0
16 Dec 2024
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Ying Shan
Xihui Liu
LLMAG
LRM
143
5
0
05 Dec 2024
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World
Weixin Mao
Weiheng Zhong
Zhou Jiang
Dong Fang
Zhongyue Zhang
...
Fan Jia
Tiancai Wang
Haoqiang Fan
Osamu Yoshie
Osamu Yoshie
152
6
0
29 Nov 2024
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Di Zhang
Jingdi Lei
Junxian Li
Xunzhi Wang
Yong Liu
...
Steve Yang
Jianbo Wu
Peng Ye
Wanli Ouyang
Dongzhan Zhou
OffRL
LRM
133
7
0
27 Nov 2024
Evaluating Vision-Language Models as Evaluators in Path Planning
Mohamed Aghzal
Xiang Yue
Erion Plaku
Ziyu Yao
LRM
129
1
0
27 Nov 2024
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Xinhao Liu
Jiajian Li
Yichen Jiang
Niranjan Sujay
Zhiyong Yang
Juexiao Zhang
John Abanes
Jing Zhang
Chen Feng
138
2
0
26 Nov 2024
COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
Jinqi Xiao
S. Sang
Tiancheng Zhi
Jing Liu
Qing Yan
Linjie Luo
Bo Yuan
Bo Yuan
VLM
137
2
0
26 Nov 2024
Object-centric proto-symbolic behavioural reasoning from pixels
R. S. V. Bergen
Justus F. Hübotter
Pablo Lanillos
LM&Ro
OCL
135
1
0
26 Nov 2024
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
Yongwei Chen
Yushi Lan
Shangchen Zhou
Tengfei Wang
Xingang Pan
151
5
0
25 Nov 2024
Is 'Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning
Ji Hyeok Jung
Eun Tae Kim
S. Kim
Joo Ho Lee
Bumsoo Kim
Buru Chang
VLM
413
1
0
24 Nov 2024
DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models
Yongdong Wang
Runze Xiao
Jun Younes Louhi Kasahara
Ryosuke Yajima
Keiji Nagatani
Atsushi Yamashita
Hajime Asama
61
4
0
13 Nov 2024
CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision
Gi-Cheon Kang
Junghyun Kim
Kyuhwan Shim
Jun Ki Lee
Byoung-Tak Zhang
LM&Ro
163
1
1
01 Nov 2024
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities
Zheyuan Zhang
Fengyuan Hu
Jayjun Lee
Freda Shi
Parisa Kordjamshidi
Joyce Chai
Ziqiao Ma
108
12
0
22 Oct 2024
Task-oriented Robotic Manipulation with Vision Language Models
Nurhan Bulus Guran
Hanchi Ren
Jingjing Deng
Xianghua Xie
73
4
0
21 Oct 2024
1
2
3
Next