ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.02561
  4. Cited By
Physically Grounded Vision-Language Models for Robotic Manipulation
v1v2v3v4 (latest)

Physically Grounded Vision-Language Models for Robotic Manipulation

5 September 2023
Jensen Gao
Bidipta Sarkar
F. Xia
Ted Xiao
Jiajun Wu
Brian Ichter
Anirudha Majumdar
Dorsa Sadigh
    LM&Ro
ArXiv (abs)PDFHTML

Papers citing "Physically Grounded Vision-Language Models for Robotic Manipulation"

36 / 36 papers shown
Title
CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity
CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity
Guang Yin
Yitong Li
Yixuan Wang
D. Mcconachie
Paarth Shah
Kunimatsu Hashimoto
Huan Zhang
Katherine Liu
Yunzhu Li
LM&Ro
10
0
0
19 Jun 2025
Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins
Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins
Chuanruo Ning
Kuan Fang
Wei-Chiu Ma
LM&RoAI4CE
29
0
0
16 Jun 2025
UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation
UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation
Yihe Tang
Wenlong Huang
Yingke Wang
Chengshu Li
Roy Yuan
Ruohan Zhang
Jiajun Wu
Li Fei-Fei
48
0
0
10 Jun 2025
Refer to Anything with Vision-Language Prompts
Shengcao Cao
Zijun Wei
Jason Kuen
Kangning Liu
Lingzhi Zhang
Jiuxiang Gu
HyunJoon Jung
Liang-Yan Gui
Yu Wang
VLM
117
0
0
05 Jun 2025
Understanding Physical Properties of Unseen Deformable Objects by Leveraging Large Language Models and Robot Actions
Understanding Physical Properties of Unseen Deformable Objects by Leveraging Large Language Models and Robot Actions
Changmin Park
Beomjoon Lee
Haechan Jung
Haejin Jung
Changjoo Nam
LM&Ro
101
0
0
04 Jun 2025
mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation
mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation
Chan-wei Hu
Yueqi Wang
Shuo Xing
Chia-Ju Chen
Zhengzhong Tu
3DV
17
1
0
29 May 2025
Right Side Up? Disentangling Orientation Understanding in MLLMs with Fine-grained Multi-axis Perception Tasks
Right Side Up? Disentangling Orientation Understanding in MLLMs with Fine-grained Multi-axis Perception Tasks
Keanu Nichols
Nazia Tasnim
Yuting Yan
Nicholas Ikechukwu
Elva Zou
Deepti Ghadiyaram
Bryan A. Plummer
55
0
0
27 May 2025
SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams
SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams
Zhuoheng Gao
Yihao Li
Jiyao Zhang
Rui Zhao
Tong Wu
Hao Tang
Zhaofei Yu
Hao Dong
Guozhang Chen
Tiejun Huang
46
0
0
26 May 2025
EMAC+: Embodied Multimodal Agent for Collaborative Planning with VLM+LLM
EMAC+: Embodied Multimodal Agent for Collaborative Planning with VLM+LLM
Shuang Ao
Flora D. Salim
Simon Khan
LLMAGLM&Ro
34
0
0
26 May 2025
On the Dual-Use Dilemma in Physical Reasoning and Force
On the Dual-Use Dilemma in Physical Reasoning and Force
William Xie
Enora Rice
N. Correll
42
0
0
24 May 2025
Policy Contrastive Decoding for Robotic Foundation Models
Policy Contrastive Decoding for Robotic Foundation Models
Shihan Wu
Ji Zhang
Xu Luo
Junlin Xie
Jingkuan Song
Heng Tao Shen
Lianli Gao
OffRL
266
0
0
19 May 2025
Unfettered Forceful Skill Acquisition with Physical Reasoning and Coordinate Frame Labeling
William Xie
Max Conway
Yutong Zhang
N. Correll
LM&RoLRM
77
1
0
14 May 2025
Symbolically-Guided Visual Plan Inference from Uncurated Video Data
Symbolically-Guided Visual Plan Inference from Uncurated Video Data
Wenyan Yang
Ahmet Tikna
Yi Zhao
Yuying Zhang
Luigi Palopoli
Marco Roveri
Joni Pajarinen
VGen
55
0
0
13 May 2025
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
Linus Nwankwo
Bjoern Ellensohn
Ozan Özdenizci
Elmar Rueckert
LM&Ro
241
0
0
03 May 2025
PhysicsGen: Can Generative Models Learn from Images to Predict Complex Physical Relations?
M. Spitznagel
Jan Vaillant
J. Keuper
AI4CEVGenPINN
102
0
0
07 Mar 2025
Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
Yunhai Feng
Jiaming Han
Zhiyong Yang
Xiangyu Yue
Sergey Levine
Jianlan Luo
LM&Ro
125
7
0
23 Feb 2025
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
Xinyu Zhang
Yuxuan Dong
Yongpeng Wu
Jiaxing Huang
Chengyou Jia
Basura Fernando
Mike Zheng Shou
Lingling Zhang
Jun Liu
AIMatReLMLRM
112
13
0
17 Feb 2025
RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception
RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception
Joshua R. Waite
Md Zahid Hasan
Qisai Liu
Zhanhong Jiang
Chinmay Hegde
Soumik Sarkar
OffRLSyDa
274
1
0
31 Jan 2025
Robust Contact-rich Manipulation through Implicit Motor Adaptation
Robust Contact-rich Manipulation through Implicit Motor Adaptation
Teng Xue
Amirreza Razmjoo
Suhan Shetty
Sylvain Calinon
188
1
0
16 Dec 2024
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
Rick Akkerman
Haiwen Feng
M. Black
Dimitrios Tzionas
Victoria Fernandez-Abrevaya
VGenAI4CE
199
3
0
16 Dec 2024
Is 'Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning
Is 'Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning
Ji Hyeok Jung
Eun Tae Kim
S. Kim
Joo Ho Lee
Bumsoo Kim
Buru Chang
VLM
513
2
0
24 Nov 2024
CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision
CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision
Gi-Cheon Kang
Junghyun Kim
Kyuhwan Shim
Jun Ki Lee
Byoung-Tak Zhang
LM&Ro
315
2
1
01 Nov 2024
Task-oriented Robotic Manipulation with Vision Language Models
Task-oriented Robotic Manipulation with Vision Language Models
Nurhan Bulus Guran
Hanchi Ren
Jingjing Deng
Xianghua Xie
119
4
0
21 Oct 2024
LADEV: A Language-Driven Testing and Evaluation Platform for
  Vision-Language-Action Models in Robotic Manipulation
LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation
Zhijie Wang
Zhehua Zhou
Jiayang Song
Yuheng Huang
Zhan Shu
Lei Ma
90
1
0
07 Oct 2024
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
Qiaojun Yu
Siyuan Huang
Xibin Yuan
Zhengkai Jiang
Ce Hao
...
Junbo Wang
Liu Liu
Hongsheng Li
Peng Gao
Cewu Lu
130
3
0
30 Sep 2024
Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition
Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition
Minseo Kwon
Yaesol Kim
Young J. Kim
99
4
0
28 Sep 2024
Word2Wave: Language Driven Mission Programming for Efficient Subsea Deployments of Marine Robots
Word2Wave: Language Driven Mission Programming for Efficient Subsea Deployments of Marine Robots
Ruo Chen
David Blow
Adnan Abdullah
Md Jahidul Islam
123
1
0
27 Sep 2024
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
V. Bhat
Prashanth Krishnamurthy
Ramesh Karri
Farshad Khorrami
142
6
0
16 Sep 2024
ReplanVLM: Replanning Robotic Tasks with Visual Language Models
ReplanVLM: Replanning Robotic Tasks with Visual Language Models
Aoran Mei
Guo-Niu Zhu
Huaxiang Zhang
Zhongxue Gan
93
15
0
31 Jul 2024
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models
Lynn Chua
Badih Ghazi
Yangsibo Huang
Pritish Kamath
Ravi Kumar
Pasin Manurangsi
Amer Sinha
Chulin Xie
Chiyuan Zhang
156
2
0
23 Jun 2024
Policy Learning with a Language Bottleneck
Policy Learning with a Language Bottleneck
Megha Srivastava
Cédric Colas
Dorsa Sadigh
Jacob Andreas
115
3
0
07 May 2024
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
Dingzhe Li
Yixiang Jin
A. Yong
Yong A
Hongze Yu
...
Huaping Liu
Gang Hua
F. Sun
Jianwei Zhang
Bin Fang
AI4CELM&Ro
219
15
0
28 Apr 2024
ZeroCAP: Zero-Shot Multi-Robot Context Aware Pattern Formation via Large Language Models
ZeroCAP: Zero-Shot Multi-Robot Context Aware Pattern Formation via Large Language Models
Vishnunandan L. N. Venkatesh
Byung-Cheol Min
LM&Ro
193
2
0
02 Apr 2024
Few-Shot Image Classification and Segmentation as Visual Question
  Answering Using Vision-Language Models
Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models
Tian Meng
Yang Tao
Ruilin Lyu
Wuliang Yin
VLM
81
1
0
15 Mar 2024
PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large
  Multimodal Models
PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large Multimodal Models
Dingkun Guo
Yuqi Xiang
Shuqi Zhao
Xinghao Zhu
Masayoshi Tomizuka
Mingyu Ding
Wei Zhan
83
11
0
26 Feb 2024
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
Michael Dorkenwald
Nimrod Barazani
Cees G. M. Snoek
Yuki M. Asano
VLMMLLM
59
12
0
13 Feb 2024
1