ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.10419
  4. Cited By
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
v1v2 (latest)

HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models

16 September 2024
V. Bhat
Prashanth Krishnamurthy
Ramesh Karri
Farshad Khorrami
ArXiv (abs)PDFHTML

Papers citing "HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models"

50 / 70 papers shown
Title
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
Fanqi Lin
Ruiqian Nai
Yingdong Hu
Jiacheng You
Junming Zhao
Yang Gao
LRM
97
0
0
17 May 2025
AffordGrasp: In-Context Affordance Reasoning for Open-Vocabulary Task-Oriented Grasping in Clutter
Yingbo Tang
Shanghang Zhang
Xiaoshuai Hao
Pengwei Wang
Jianlong Wu
Zihan Wang
Shanghang Zhang
118
7
0
02 Mar 2025
Neuro-LIFT: A Neuromorphic, LLM-based Interactive Framework for Autonomous Drone FlighT at the Edge
Neuro-LIFT: A Neuromorphic, LLM-based Interactive Framework for Autonomous Drone FlighT at the Edge
Amogh Joshi
Sourav Sanyal
Kaushik Roy
156
2
0
31 Jan 2025
Towards Visual Grounding: A Survey
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
255
5
0
31 Dec 2024
Language-driven Grasp Detection
Language-driven Grasp Detection
An Dinh Vuong
Minh Nhat Vu
Baoru Huang
Nghia Nguyen
Hieu Le
T. Vo
Anh Nguyen
VLM
92
19
0
13 Jun 2024
Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V
Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V
Peiyuan Zhi
Zhiyuan Zhang
Muzhi Han
Zeyu Zhang
Zhitian Li
Ziyuan Jiao
Ziyuan Jiao
Siyuan Huang
Siyuan Huang
LRMLM&Ro
94
33
0
16 Apr 2024
Grounding LLMs For Robot Task Planning Using Closed-loop State Feedback
Grounding LLMs For Robot Task Planning Using Closed-loop State Feedback
V. Bhat
Ali Umut Kaypak
Prashanth Krishnamurthy
Ramesh Karri
Farshad Khorrami
LM&Ro
94
18
0
13 Feb 2024
Reasoning Grasping via Multimodal Large Language Model
Reasoning Grasping via Multimodal Large Language Model
Shiyu Jin
Jinxuan Xu
Yutian Lei
Liangjun Zhang
LRM
79
21
0
09 Feb 2024
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Jiannan Wu
Yi Jiang
Bin Yan
Huchuan Lu
Zehuan Yuan
Ping Luo
VOS
101
18
0
25 Dec 2023
UniTeam: Open Vocabulary Mobile Manipulation Challenge
UniTeam: Open Vocabulary Mobile Manipulation Challenge
Andrew Melnik
Michael Büttner
Leon Harz
Lyon Brown
G. C. Nandi
PS Arjun
Gaurav Kumar Yadav
Rahul Kala
R. Haschke
LM&Ro
57
13
0
14 Dec 2023
GPT-4V(ision) for Robotics: Multimodal Task Planning from Human
  Demonstration
GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration
Naoki Wake
Atsushi Kanehira
Kazuhiro Sasabuchi
Jun Takamatsu
Katsushi Ikeuchi
LM&Ro
72
68
0
20 Nov 2023
GLiNER: Generalist Model for Named Entity Recognition using
  Bidirectional Transformer
GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer
Urchade Zaratiana
Nadi Tomeh
Pierre Holat
Thierry Charnois
48
38
0
14 Nov 2023
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in
  Clutter
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter
Georgios Tziafas
Yucheng Xu
Arushi Goel
Mohammadreza Kasaei
Zhibin Li
Hamidreza Kasaei
80
26
0
09 Nov 2023
LAN-grasp: Using Large Language Models for Semantic Object Grasping
LAN-grasp: Using Large Language Models for Semantic Object Grasping
Reihaneh Mirjalili
Michael Krawez
Simone Silenzi
Yannik Blei
Wolfram Burgard
VLM
104
29
0
08 Oct 2023
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
Zhengyuan Yang
Linjie Li
Kevin Qinghong Lin
Jianfeng Wang
Chung-Ching Lin
Nasim Shakouri Mahmoudabadi
Lijuan Wang
LM&MA
75
644
0
29 Sep 2023
Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping
Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping
Adam Rashid
Satvik Sharma
Chung Min Kim
Justin Kerr
Lawrence Yunliang Chen
Angjoo Kanazawa
Ken Goldberg
123
93
0
14 Sep 2023
Physically Grounded Vision-Language Models for Robotic Manipulation
Physically Grounded Vision-Language Models for Robotic Manipulation
Jensen Gao
Bidipta Sarkar
F. Xia
Ted Xiao
Jiajun Wu
Brian Ichter
Anirudha Majumdar
Dorsa Sadigh
LM&Ro
89
133
0
05 Sep 2023
Referring Image Segmentation Using Text Supervision
Referring Image Segmentation Using Text Supervision
Fang Liu
Yuhao Liu
Yuqiu Kong
Ke Xu
Lulu Zhang
Baocai Yin
Gerhard Hancke
Rynson W. H. Lau
82
31
0
28 Aug 2023
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects
  in Cluttered Indoor Scenes
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor Scenes
Yuhao Lu
Yixuan Fan
Beixing Deng
Fan Liu
Yali Li
Shengjin Wang
74
31
0
01 Aug 2023
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
  Control
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Anthony Brohan
Noah Brown
Justice Carbajal
Yevgen Chebotar
Xi Chen
...
Ted Xiao
Peng Xu
Sichun Xu
Tianhe Yu
Brianna Zitkovich
LM&RoLRM
179
1,291
0
28 Jul 2023
Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation
Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation
Bokui (William) Shen
Ge Yang
Alan Yu
J. Wong
L. Kaelbling
Phillip Isola
VLM
92
112
0
27 Jul 2023
Multimodal Diffusion Segmentation Model for Object Segmentation from
  Manipulation Instructions
Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation Instructions
Yui Iioka
Y. Yoshida
Yuiga Wada
Shumpei Hatanaka
K. Sugiura
DiffM
112
5
0
17 Jul 2023
Robots That Ask For Help: Uncertainty Alignment for Large Language Model
  Planners
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Allen Z. Ren
Anushri Dixit
Alexandra Bodrova
Sumeet Singh
Stephen Tu
...
Jacob Varley
Zhenjia Xu
Dorsa Sadigh
Andy Zeng
Anirudha Majumdar
LM&Ro
261
239
0
04 Jul 2023
Segment Anything
Segment Anything
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLMVLM
371
7,405
0
05 Apr 2023
Universal Instance Perception as Object Discovery and Retrieval
Universal Instance Perception as Object Discovery and Retrieval
B. Yan
Yi Jiang
Jiannan Wu
D. Wang
Ping Luo
Zehuan Yuan
Huchuan Lu
VOSVLMLRM
114
174
0
12 Mar 2023
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
191
2,023
0
09 Mar 2023
PaLM-E: An Embodied Multimodal Language Model
PaLM-E: An Embodied Multimodal Language Model
Danny Driess
F. Xia
Mehdi S. M. Sajjadi
Corey Lynch
Aakanksha Chowdhery
...
Marc Toussaint
Klaus Greff
Andy Zeng
Igor Mordatch
Peter R. Florence
LM&Ro
120
1,673
0
06 Mar 2023
Open-World Object Manipulation using Pre-trained Vision-Language Models
Open-World Object Manipulation using Pre-trained Vision-Language Models
Austin Stone
Ted Xiao
Yao Lu
K. Gopalakrishnan
Kuang-Huei Lee
...
Sean Kirmani
Brianna Zitkovich
F. Xia
Chelsea Finn
Karol Hausman
LM&Ro
259
155
0
02 Mar 2023
Task-Oriented Grasp Prediction with Visual-Language Inputs
Task-Oriented Grasp Prediction with Visual-Language Inputs
Chao Tang
Dehao Huang
Lingxiao Meng
Weiyu Liu
Hong Zhang
45
36
0
28 Feb 2023
A Joint Modeling of Vision-Language-Action for Target-oriented Grasping
  in Clutter
A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter
Kechun Xu
Shuqing Zhao
Zhongxiang Zhou
Zizhang Li
Huaijin Pi
Yifeng Zhu
Yue Wang
R. Xiong
66
49
0
24 Feb 2023
PolyFormer: Referring Image Segmentation as Sequential Polygon
  Generation
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Jiang Liu
Hui Ding
Zhaowei Cai
Yuting Zhang
R. Satzoda
Vijay Mahadevan
R. Manmatha
ObjD
93
131
0
14 Feb 2023
AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal
  Domains
AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains
Haoshu Fang
Chenxi Wang
Hongjie Fang
Minghao Gou
Jirong Liu
Hengxu Yan
Wenhai Liu
Yichen Xie
Cewu Lu
133
210
0
16 Dec 2022
RT-1: Robotics Transformer for Real-World Control at Scale
RT-1: Robotics Transformer for Real-World Control at Scale
Anthony Brohan
Noah Brown
Justice Carbajal
Yevgen Chebotar
Joseph Dabis
...
Ted Xiao
Peng Xu
Sichun Xu
Tianhe Yu
Brianna Zitkovich
LM&Ro
135
1,159
0
13 Dec 2022
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large
  Language Models
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
Chan Hee Song
Jiaman Wu
Clay Washington
Brian M Sadler
Wei-Lun Chao
Yu-Chuan Su
LLMAGLM&Ro
135
422
0
08 Dec 2022
Visual Language Maps for Robot Navigation
Visual Language Maps for Robot Navigation
Chen Huang
Oier Mees
Andy Zeng
Wolfram Burgard
LM&Ro
243
369
0
11 Oct 2022
ProgPrompt: Generating Situated Robot Task Plans using Large Language
  Models
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models
Ishika Singh
Valts Blukis
Arsalan Mousavian
Ankit Goyal
Danfei Xu
Jonathan Tremblay
Dieter Fox
Jesse Thomason
Animesh Garg
LM&RoLLMAG
175
657
0
22 Sep 2022
Inner Monologue: Embodied Reasoning through Planning with Language
  Models
Inner Monologue: Embodied Reasoning through Planning with Language Models
Wenlong Huang
F. Xia
Ted Xiao
Harris Chan
Jacky Liang
...
Tomas Jackson
Linda Luu
Sergey Levine
Karol Hausman
Brian Ichter
LLMAGLM&RoLRM
134
920
0
12 Jul 2022
Hybrid Physical Metric For 6-DoF Grasp Pose Detection
Hybrid Physical Metric For 6-DoF Grasp Pose Detection
Yuhao Lu
Beixing Deng
Zhenyu Wang
Peiyuan Zhi
Yali Li
Shengjin Wang
41
17
0
22 Jun 2022
Simple Open-Vocabulary Object Detection with Vision Transformers
Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
...
Zhuoran Shen
Tianlin Li
Xiaohua Zhai
Thomas Kipf
N. Houlsby
ObjDCLIPVLMViTOCL
94
314
0
12 May 2022
Improving Visual Grounding with Visual-Linguistic Verification and
  Iterative Reasoning
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning
Li Yang
Yan Xu
Chunfen Yuan
Wei Liu
Bing Li
Weiming Hu
ObjD
68
117
0
30 Apr 2022
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Michael Ahn
Anthony Brohan
Noah Brown
Yevgen Chebotar
Omar Cortes
...
Ted Xiao
Peng Xu
Sichun Xu
Mengyuan Yan
Andy Zeng
LM&Ro
192
1,988
0
04 Apr 2022
Text2Pos: Text-to-Point-Cloud Cross-Modal Localization
Text2Pos: Text-to-Point-Cloud Cross-Modal Localization
Manuel Kolmet
Qunjie Zhou
Aljosa Osep
Laura Leal-Taixe
75
23
0
28 Mar 2022
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge
  for Embodied Agents
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
Wenlong Huang
Pieter Abbeel
Deepak Pathak
Igor Mordatch
LM&Ro
99
1,125
0
18 Jan 2022
Image Segmentation Using Text and Image Prompts
Image Segmentation Using Text and Image Prompts
Timo Lüddecke
Alexander S. Ecker
CLIPVLM
146
474
0
18 Dec 2021
CLIPort: What and Where Pathways for Robotic Manipulation
CLIPort: What and Where Pathways for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
Dieter Fox
LM&Ro
123
661
0
24 Sep 2021
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
Hanbo Zhang
Yunfan Lu
Cunjun Yu
David Hsu
Xuguang Lan
Nanning Zheng
LM&Ro
82
66
0
25 Aug 2021
TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D
  Visual Grounding
TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding
Dailan He
Yusheng Zhao
Junyu Luo
Tianrui Hui
Shaofei Huang
Aixi Zhang
Si Liu
ViT
51
95
0
05 Aug 2021
End-to-end Trainable Deep Neural Network for Robotic Grasp Detection and
  Semantic Segmentation from RGB
End-to-end Trainable Deep Neural Network for Robotic Grasp Detection and Semantic Segmentation from RGB
Stefan Ainetter
Friedrich Fraundorfer
72
124
0
12 Jul 2021
Cross-Modal Progressive Comprehension for Referring Segmentation
Cross-Modal Progressive Comprehension for Referring Segmentation
Si Liu
Tianrui Hui
Shaofei Huang
Yunchao Wei
Yue Liu
Guanbin Li
EgoVVOS
69
129
0
15 May 2021
Encoder Fusion Network with Co-Attention Embedding for Referring Image
  Segmentation
Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation
Guang Feng
Zhiwei Hu
Lihe Zhang
Huchuan Lu
EgoV
69
172
0
05 May 2021
12
Next