ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.17842
  4. Cited By
Look Before You Leap: Unveiling the Power of GPT-4V in Robotic
  Vision-Language Planning

Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning

29 November 2023
Yingdong Hu
Fanqi Lin
Tong Zhang
Li Yi
Yang Gao
    LM&Ro
ArXivPDFHTML

Papers citing "Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning"

34 / 34 papers shown
Title
Symbolically-Guided Visual Plan Inference from Uncurated Video Data
Symbolically-Guided Visual Plan Inference from Uncurated Video Data
Wenyan Yang
Ahmet Tikna
Yi Zhao
Yuying Zhang
Luigi Palopoli
Marco Roveri
J. Pajarinen
VGen
26
0
0
13 May 2025
Imagine, Verify, Execute: Memory-Guided Agentic Exploration with Vision-Language Models
Imagine, Verify, Execute: Memory-Guided Agentic Exploration with Vision-Language Models
Seungjae Lee
Daniel Ekpo
Haowen Liu
Furong Huang
Abhinav Shrivastava
Jia-Bin Huang
LM&Ro
40
0
0
12 May 2025
Multi-agent Embodied AI: Advances and Future Directions
Multi-agent Embodied AI: Advances and Future Directions
Zhaohan Feng
Ruiqi Xue
Lei Yuan
Yang Yu
Ning Ding
M. Liu
Bingzhao Gao
Jian-jun Sun
Gang Wang
AI4CE
57
1
0
08 May 2025
Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter
Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter
Kechun Xu
Xunlong Xia
Kaixuan Wang
Yifei Yang
Yunxuan Mao
Bing Deng
R. Xiong
Y. Wang
OffRL
66
0
0
12 Mar 2025
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
Xin Wen
Bingchen Zhao
Yilun Chen
Jiangmiao Pang
Xiaojuan Qi
LM&Ro
46
0
0
10 Mar 2025
Bilevel Learning for Bilevel Planning
Bilevel Learning for Bilevel Planning
Bowen Li
Tom Silver
Sebastian A. Scherer
Alexander G. Gray
68
1
0
12 Feb 2025
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control
Junjie Wen
Y. X. Zhu
Jinming Li
Zhibin Tang
Chaomin Shen
Feifei Feng
VLM
55
12
0
09 Feb 2025
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World
Weixin Mao
Weiheng Zhong
Zhou Jiang
Dong Fang
Zhongyue Zhang
...
Fan Jia
Tiancai Wang
Haoqiang Fan
Osamu Yoshie
Osamu Yoshie
116
4
0
29 Nov 2024
CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision
CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision
Gi-Cheon Kang
Junghyun Kim
Kyuhwan Shim
Jun Ki Lee
Byoung-Tak Zhang
LM&Ro
97
1
1
01 Nov 2024
VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning
VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning
Yichao Liang
Nishanth Kumar
Hao Tang
Adrian Weller
J. Tenenbaum
Tom Silver
Joao Henriques
Kevin Ellis
40
8
0
30 Oct 2024
In-Context Learning Enables Robot Action Prediction in LLMs
In-Context Learning Enables Robot Action Prediction in LLMs
Yida Yin
Zekai Wang
Yuvan Sharma
Dantong Niu
Trevor Darrell
Roei Herzig
LM&Ro
112
1
0
16 Oct 2024
Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy
Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy
Ricardo Garcia
Shizhe Chen
Cordelia Schmid
LM&Ro
36
7
0
02 Oct 2024
GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation
GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation
Yangtao Chen
Zixuan Chen
Junhui Yin
Jing Huo
Pinzhuo Tian
Jieqi Shi
Yang Gao
LM&Ro
42
2
0
30 Sep 2024
AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots
AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots
Zhaxizhuoma
Pengan Chen
Ziniu Wu
Jiawei Sun
Dong Wang
Peng Zhou
Nieqing Cao
Yan Ding
Bin Zhao
Xuelong Li
46
4
0
18 Sep 2024
VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation
VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation
Wentao Zhao
Jiaming Chen
Ziyu Meng
Donghui Mao
Ran Song
Wei Zhang
43
8
0
13 Jul 2024
FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large
  Language Models
FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models
Zhikai Zhang
Yitang Li
Haofeng Huang
Mingxian Lin
Li Yi
43
2
0
15 Jun 2024
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration
Yang Zhang
Shixin Yang
Chenjia Bai
Fei Wu
Xiu Li
Zhen Wang
Xuelong Li
LLMAG
31
25
0
23 May 2024
Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V
Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V
Peiyuan Zhi
Zhiyuan Zhang
Muzhi Han
Zeyu Zhang
Zhitian Li
Ziyuan Jiao
Ziyuan Jiao
Siyuan Huang
Siyuan Huang
LRM
LM&Ro
38
29
0
16 Apr 2024
RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents
RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents
Zeren Chen
Zhelun Shi
Xiaoya Lu
Lehan He
Sucheng Qian
...
Zhen-fei Yin
Jing Shao
Jing Shao
Cewu Lu
Cewu Lu
38
5
0
28 Mar 2024
BBA: Bi-Modal Behavioral Alignment for Reasoning with Large
  Vision-Language Models
BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models
Xueliang Zhao
Xinting Huang
Tingchen Fu
Qintong Li
Shansan Gong
Lemao Liu
Wei Bi
Lingpeng Kong
LRM
37
1
0
21 Feb 2024
VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large
  Models
VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models
Yi Zhao
Yilin Zhang
Rong Xiang
Jing Li
Hillming Li
31
16
0
29 Jan 2024
A Universal Semantic-Geometric Representation for Robotic Manipulation
A Universal Semantic-Geometric Representation for Robotic Manipulation
Tong Zhang
Yingdong Hu
Hanchen Cui
Hang Zhao
Yang Gao
64
17
0
18 Jun 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
206
900
0
27 Apr 2023
Vision-Language Models as Success Detectors
Vision-Language Models as Success Detectors
Yuqing Du
Ksenia Konyushkova
Misha Denil
A. Raju
Jessica Landon
Felix Hill
Nando de Freitas
Serkan Cabi
MLLM
LRM
84
77
0
13 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
270
4,229
0
30 Jan 2023
Real-World Robot Learning with Masked Visual Pre-training
Real-World Robot Learning with Masked Visual Pre-training
Ilija Radosavovic
Tete Xiao
Stephen James
Pieter Abbeel
Jitendra Malik
Trevor Darrell
SSL
156
239
0
06 Oct 2022
ProgPrompt: Generating Situated Robot Task Plans using Large Language
  Models
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models
Ishika Singh
Valts Blukis
Arsalan Mousavian
Ankit Goyal
Danfei Xu
Jonathan Tremblay
D. Fox
Jesse Thomason
Animesh Garg
LM&Ro
LLMAG
120
622
0
22 Sep 2022
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language,
  Vision, and Action
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
Dhruv Shah
B. Osinski
Brian Ichter
Sergey Levine
LM&Ro
147
436
0
10 Jul 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
310
4,077
0
24 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,915
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
355
8,457
0
28 Jan 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
226
1,018
0
13 Oct 2021
TrOCR: Transformer-based Optical Character Recognition with Pre-trained
  Models
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
Minghao Li
Tengchao Lv
Jingye Chen
Lei Cui
Yijuan Lu
D. Florêncio
Cha Zhang
Zhoujun Li
Furu Wei
ViT
98
340
0
21 Sep 2021
Language Models as Knowledge Bases?
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
415
2,584
0
03 Sep 2019
1