ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.14159
  4. Cited By
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

25 January 2024
Tianhe Ren
Shilong Liu
Ailing Zeng
Jing Lin
Kunchang Li
He Cao
Jiayu Chen
Xinyu Huang
Yukang Chen
Feng Yan
Zhaoyang Zeng
Hao Zhang
Feng Li
Jie-jin Yang
Hongyang Li
Qing Jiang
Lei Zhang
    VLM
ArXivPDFHTML

Papers citing "Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks"

50 / 100 papers shown
Title
Deploying Foundation Model-Enabled Air and Ground Robots in the Field: Challenges and Opportunities
Deploying Foundation Model-Enabled Air and Ground Robots in the Field: Challenges and Opportunities
Zachary Ravichandran
Fernando Cladera
Jason Hughes
Varun Murali
M. Hsieh
George J. Pappas
Camillo J Taylor
Vijay R. Kumar
LM&Ro
40
0
0
14 May 2025
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
Yifu Yuan
Haiqin Cui
Yibin Chen
Zibin Dong
Fei Ni
Longxin Kou
Jinyi Liu
Pengyi Li
Yan Zheng
Jianye Hao
31
0
0
13 May 2025
Leveraging Multi-Modal Information to Enhance Dataset Distillation
Leveraging Multi-Modal Information to Enhance Dataset Distillation
Zhe Li
Hadrien Reynaud
Bernhard Kainz
DD
45
0
0
13 May 2025
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Zongchuang Zhao
Haoyu Fu
Dingkang Liang
Xin Zhou
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
MLLM
VLM
49
0
0
13 May 2025
UniDiffGrasp: A Unified Framework Integrating VLM Reasoning and VLM-Guided Part Diffusion for Open-Vocabulary Constrained Grasping with Dual Arms
UniDiffGrasp: A Unified Framework Integrating VLM Reasoning and VLM-Guided Part Diffusion for Open-Vocabulary Constrained Grasping with Dual Arms
Xueyang Guo
Hongwei Hu
Chengye Song
J. Chen
Zilin Zhao
Yu Fu
Bowen Guan
Zhenze Liu
31
0
0
11 May 2025
FoodTrack: Estimating Handheld Food Portions with Egocentric Video
FoodTrack: Estimating Handheld Food Portions with Egocentric Video
Ervin Wang
Yuhao Chen
EgoV
67
0
0
07 May 2025
CountDiffusion: Text-to-Image Synthesis with Training-Free Counting-Guidance Diffusion
CountDiffusion: Text-to-Image Synthesis with Training-Free Counting-Guidance Diffusion
Yongqian Li
Pencheng Wan
Liang Han
Yaowei Wang
Liqiang Nie
Min Zhang
43
0
0
07 May 2025
Estimating the Diameter at Breast Height of Trees in a Forest With a Single 360 Camera
Estimating the Diameter at Breast Height of Trees in a Forest With a Single 360 Camera
Siming He
Zachary Osman
Fernando Cladera
Dexter Ong
Nitant Rai
Patrick Corey Green
Vijay R. Kumar
Pratik Chaudhari
38
0
0
06 May 2025
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Lu Ling
C. Lin
Nayeon Lee
Yin Cui
Y. Zeng
Yichen Sheng
Yunhao Ge
Ming-Yu Liu
Aniket Bera
Zhaoshuo Li
VGen
3DV
56
0
0
05 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
56
0
0
03 May 2025
V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving
V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving
Jannik Lübberstedt
Esteban Rivera
Nico Uhlemann
Markus Lienkamp
MLLM
63
0
0
30 Apr 2025
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding
Trilok Padhi
R. Kaur
Adam D. Cobb
Manoj Acharya
Anirban Roy
Colin Samplawski
Brian Matejek
Alexander M. Berenbeim
Nathaniel D. Bastian
Susmit Jha
28
0
0
30 Apr 2025
MoSAM: Motion-Guided Segment Anything Model with Spatial-Temporal Memory Selection
MoSAM: Motion-Guided Segment Anything Model with Spatial-Temporal Memory Selection
Q. Yang
Yuan Yao
Miaomiao Cui
Liefeng Bo
VLM
61
0
0
30 Apr 2025
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics
Marc Glocker
Peter Honig
Matthias Hirschmanner
Markus Vincze
LM&Ro
83
1
0
30 Apr 2025
ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction
ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction
Qihao Liu
Ju He
Qihang Yu
Liang-Chieh Chen
Alan Yuille
DiffM
VGen
83
0
0
30 Apr 2025
Anyprefer: An Agentic Framework for Preference Data Synthesis
Anyprefer: An Agentic Framework for Preference Data Synthesis
Yiyang Zhou
Zekun Wang
Tianle Wang
Shangyu Xing
Peng Xia
...
Chetan Bansal
Weitong Zhang
Ying Wei
Joey Tianyi Zhou
Huaxiu Yao
63
1
0
27 Apr 2025
FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation
FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation
Zebin Yao
Lei Ren
Huixing Jiang
Chen Wei
Xiaojie Wang
Ruifan Li
Fangxiang Feng
DiffM
76
0
0
22 Apr 2025
AffordanceSAM: Segment Anything Once More in Affordance Grounding
AffordanceSAM: Segment Anything Once More in Affordance Grounding
D. Jiang
Mengmeng Wang
Teli Ma
Hao Li
Yong-Jin Liu
Guang Dai
L. Zhang
32
0
0
22 Apr 2025
ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos
ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos
Zetong Zhang
Manuel Kaufmann
Lixin Xue
Jie Song
Martin R. Oswald
3DH
71
0
0
17 Apr 2025
Post-Hurricane Debris Segmentation Using Fine-Tuned Foundational Vision Models
Post-Hurricane Debris Segmentation Using Fine-Tuned Foundational Vision Models
Kooshan Amini
Yuhao Liu
Jamie Ellen Padgett
Guha Balakrishnan
Ashok Veeraraghavan
33
0
0
17 Apr 2025
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding
Qianqian Sun
Jixiang Luo
Dell Zhang
Xuelong Li
DiffM
54
0
0
17 Apr 2025
Continuous Locomotive Crowd Behavior Generation
Continuous Locomotive Crowd Behavior Generation
Inhwan Bae
Junoh Lee
Hae-Gon Jeon
31
0
0
07 Apr 2025
MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction
MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction
Wenyuan Zhang
Yixiao Yang
Han Huang
Liang Han
Kanle Shi
Yu-Shen Liu
Zhizhong Han
MDE
60
3
0
24 Mar 2025
How to Train Your Dragon: Automatic Diffusion-Based Rigging for Characters with Diverse Topologies
How to Train Your Dragon: Automatic Diffusion-Based Rigging for Characters with Diverse Topologies
Zeqi Gu
Difan Liu
Timothy Langlois
Matthew Fisher
Abe Davis
DiffM
3DH
62
0
0
19 Mar 2025
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes
L. Yang
Kaixin Zhu
Juanxi Tian
Bohan Zeng
Matthieu Lin
Hongjuan Pei
Wentao Zhang
Shuicheng Yan
VGen
75
0
0
17 Mar 2025
Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction
Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction
Xinyu Zhang
Haonan Chang
Yuhan Liu
Abdeslam Boularias
3DGS
39
0
0
12 Mar 2025
GraphGarment: Learning Garment Dynamics for Bimanual Cloth Manipulation Tasks
Wei Chen
Kelin Li
Dongmyoung Lee
Xiaoshuai Chen
Rui Zong
Petar Kormushev
AI4CE
41
0
0
04 Mar 2025
ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation
ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation
Yufei Wang
Ziyu Wang
Mino Nakura
Pratik Bhowal
Chia-Liang Kuo
Yi-Ting Chen
Zackory M. Erickson
David Held
66
0
0
04 Mar 2025
LangGas: Introducing Language in Selective Zero-Shot Background Subtraction for Semi-Transparent Gas Leak Detection with a New Dataset
LangGas: Introducing Language in Selective Zero-Shot Background Subtraction for Semi-Transparent Gas Leak Detection with a New Dataset
Wenqi Guo
Yiyang Du
Shan Du
75
1
0
04 Mar 2025
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond
Guanyao Wu
Haoyu Liu
Hongming Fu
Yichuan Peng
Jinyuan Liu
Xin-Yue Fan
Risheng Liu
71
0
0
03 Mar 2025
Solving Instance Detection from an Open-World Perspective
Solving Instance Detection from an Open-World Perspective
Qianqian Shen
Yunhan Zhao
Nahyun Kwon
Jeeeun Kim
Yanan Li
Shu Kong
43
0
0
01 Mar 2025
Attention-Guided Integration of CLIP and SAM for Precise Object Masking in Robotic Manipulation
Attention-Guided Integration of CLIP and SAM for Precise Object Masking in Robotic Manipulation
Muhammad A. Muttaqien
Tomohiro Motoda
Ryo Hanai
Domae Yukiyasu
46
0
0
26 Feb 2025
FUNCTO: Function-Centric One-Shot Imitation Learning for Tool Manipulation
FUNCTO: Function-Centric One-Shot Imitation Learning for Tool Manipulation
Chao Tang
Anxing Xiao
Yuhong Deng
Tianrun Hu
Wenlong Dong
Hanbo Zhang
David Hsu
Hong Zhang
73
2
0
24 Feb 2025
SMITE: Segment Me In TimE
SMITE: Segment Me In TimE
Amirhossein Alimohammadi
Sauradip Nag
Saeid Asgari Taghanaki
Andrea Tagliasacchi
Ghassan Hamarneh
Ali Mahdavi-Amiri
VLM
VOS
137
2
0
20 Feb 2025
Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments
Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments
Luca Barsellotti
Roberto Bigazzi
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
98
1
0
20 Feb 2025
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
Kaixin Yao
Longwen Zhang
Xinhao Yan
Yan Zeng
Qixuan Zhang
Wei Yang
Lan Xu
Jiayuan Gu
Jingyi Yu
29
3
0
18 Feb 2025
SAM-LAD: Segment Anything Model Meets Zero-Shot Logic Anomaly Detection
SAM-LAD: Segment Anything Model Meets Zero-Shot Logic Anomaly Detection
Yun Peng
Xiao Lin
Nachuan Ma
Jiayuan Du
Chuangwei Liu
Chengju Liu
Qi Chen
44
3
0
17 Feb 2025
Deciphering Functions of Neurons in Vision-Language Models
Deciphering Functions of Neurons in Vision-Language Models
Jiaqi Xu
Cuiling Lan
Xuejin Chen
Yan Lu
VLM
100
0
0
10 Feb 2025
Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach
Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach
A. H. Tan
Angus Fung
Haitong Wang
G. Nejat
91
2
0
31 Jan 2025
MetaOcc: Surround-View 4D Radar and Camera Fusion Framework for 3D Occupancy Prediction with Dual Training Strategies
Long Yang
Lianqing Zheng
W. Ai
Minghao Liu
Sen Li
Qunshu Lin
Shengyu Yan
Jie Bai
Zhixiong Ma
Xichan Zhu
140
0
0
28 Jan 2025
Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites
Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites
Abdalwhab Abdalwhab
A. Imran
Sina Heydarian
I. Iordanova
David St-Onge
49
0
0
16 Jan 2025
Guided SAM: Label-Efficient Part Segmentation
Guided SAM: Label-Efficient Part Segmentation
S.B. van Rooij
G.J. Burghouts
VLM
43
0
0
13 Jan 2025
Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models
Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models
Yifan Zhang
Junhui Hou
66
1
0
03 Jan 2025
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
Haoyi Jiang
Liu Liu
Tianheng Cheng
Xinjie Wang
Tianwei Lin
Zhizhong Su
Wei Liu
Xinyu Wang
3DGS
ViT
116
5
0
17 Dec 2024
ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction
ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction
Yi Feng
Yu Han
Xijing Zhang
Tanghui Li
Yanting Zhang
Rui Fan
114
3
0
15 Dec 2024
PaintScene4D: Consistent 4D Scene Generation from Text Prompts
PaintScene4D: Consistent 4D Scene Generation from Text Prompts
Vinayak Gupta
Yunze Man
Yu-Xiong Wang
VGen
83
0
0
05 Dec 2024
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan
Hanqin Liu
Yao Huang
Xiaoqi Wang
Caixin Kang
Hang Su
Yinpeng Dong
Xingxing Wei
VGen
93
0
0
04 Dec 2024
G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation
G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation
Tianxing Chen
Yao Mu
Zhixuan Liang
Z. Chen
Shijia Peng
...
Mingkun Xu
R. Hu
H. Zhang
Xuelong Li
Ping Luo
AI4CE
102
8
0
27 Nov 2024
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Claudia Cuttano
Gabriele Trivigno
Gabriele Rosi
Carlo Masone
Giuseppe Averta
VOS
106
2
0
26 Nov 2024
MC-LLaVA: Multi-Concept Personalized Vision-Language Model
Ruichuan An
Sihan Yang
Ming Lu
Kai Zeng
Yulin Luo
...
Hao Liang
Qi She
Shanghang Zhang
W. Zhang
Wentao Zhang
90
5
0
18 Nov 2024
12
Next