ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.10103
  4. Cited By
GSVA: Generalized Segmentation via Multimodal Large Language Models

GSVA: Generalized Segmentation via Multimodal Large Language Models

15 December 2023
Zhuofan Xia
Dongchen Han
Yizeng Han
Xuran Pan
Shiji Song
Gao Huang
    VLM
ArXivPDFHTML

Papers citing "GSVA: Generalized Segmentation via Multimodal Large Language Models"

50 / 55 papers shown
Title
LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery
LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery
Jerome Quenum
Wen-Han Hsieh
Tsung-Han Wu
Ritwik Gupta
Trevor Darrell
David M. Chan
MLLM
VLM
54
0
0
05 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
56
0
0
03 May 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
51
0
0
22 Apr 2025
LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation
LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation
Jiachen Li
Qing Xie
Xiaohan Yu
Hongyun Wang
Jinyu Xu
Yongjian Liu
ObjD
78
0
0
20 Apr 2025
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Tao Zhang
X. Li
Zilong Huang
Y. Li
Weixian Lei
XueQing Deng
Shihao Chen
S. Ji
Jiashi Feng
MLLM
LRM
60
2
0
14 Apr 2025
SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model
SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model
Kaiyu Li
Zepeng Xin
Li Pang
Chao Pang
Yupeng Deng
Jing Yao
Guisong Xia
Deyu Meng
Zhi Wang
Xiangyong Cao
VLM
LRM
37
0
0
13 Apr 2025
Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities
Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities
Jing Liu
Wenxuan Wang
Yisi Zhang
Yepeng Tang
Xingjian He
Longteng Guo
Tongtian Yue
Xinlong Wang
ObjD
53
0
0
02 Apr 2025
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
Yuejiao Su
Yi Wang
Qiongyang Hu
Chuang Yang
Lap-Pui Chau
47
0
0
02 Apr 2025
POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation
POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation
Lanyun Zhu
Tianrun Chen
Qianxiong Xu
Xuanyi Liu
Deyi Ji
Haiyang Wu
De Wen Soh
Xiaozhong Liu
VLM
LRM
50
0
0
01 Apr 2025
Online Reasoning Video Segmentation with Just-in-Time Digital Twins
Online Reasoning Video Segmentation with Just-in-Time Digital Twins
Yiqing Shen
Bohan Liu
Chenjia Li
Lalithkumar Seenivasan
Mathias Unberath
VOS
83
2
0
27 Mar 2025
Operating Room Workflow Analysis via Reasoning Segmentation over Digital Twins
Operating Room Workflow Analysis via Reasoning Segmentation over Digital Twins
Yiqing Shen
Chenjia Li
Bohan Liu
Cheng-Yi Li
Tito Porras
Mathias Unberath
62
2
0
26 Mar 2025
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation
Donggon Jang
Yucheol Cho
Suin Lee
Taehyeon Kim
Dae-Shik Kim
VLM
65
1
0
18 Mar 2025
EIAD: Explainable Industrial Anomaly Detection Via Multi-Modal Large Language Models
EIAD: Explainable Industrial Anomaly Detection Via Multi-Modal Large Language Models
Zongyun Zhang
Jiacheng Ruan
Xian Gao
Ting Liu
Yuzhuo Fu
70
2
0
18 Mar 2025
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
Tao Wang
Changxu Cheng
Lingfeng Wang
Senda Chen
Wuyue Zhao
VLM
72
0
0
17 Mar 2025
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
R. Hu
Lianghui Zhu
Yuxuan Zhang
Tianheng Cheng
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
ObjD
61
0
0
13 Mar 2025
Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA
Zhixuan Li
Hyunse Yoon
Sanghoon Lee
Weisi Lin
52
0
0
13 Mar 2025
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
Muzhi Zhu
Yuzhuo Tian
Hao Chen
Chunluan Zhou
Qingpei Guo
Y. Liu
M. Yang
Chunhua Shen
MLLM
VLM
72
0
0
11 Mar 2025
Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
Shiu-hong Kao
Yu-Wing Tai
Chi-Keung Tang
LRM
MLLM
56
0
0
10 Mar 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
90
2
0
08 Mar 2025
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation
Suhwan Cho
Seunghoon Lee
Minhyeok Lee
Jungho Lee
Sangyoun Lee
VOS
77
0
0
05 Mar 2025
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Hao Tang
Chenwei Xie
Haiyang Wang
Xiaoyi Bao
Tingyu Weng
Pandeng Li
Yun Zheng
Liwei Wang
ObjD
VLM
62
0
0
03 Mar 2025
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Sitong Gong
Yunzhi Zhuge
Lu Zhang
Z. Yang
Pingping Zhang
Huchuan Lu
41
0
0
15 Jan 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Haobo Yuan
Xianrui Li
Tao Zhang
Zilong Huang
Shilin Xu
S. Ji
Yunhai Tong
Lu Qi
Jiashi Feng
Ming Yang
VLM
96
11
0
07 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
88
11
0
06 Jan 2025
Towards Visual Grounding: A Survey
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
55
3
0
31 Dec 2024
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal
  Large Language Models
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models
Cong Wei
Yujie Zhong
Haoxian Tan
Yingsen Zeng
Y. Liu
Zheng Zhao
Yujiu Yang
MLLM
VLM
VOS
101
2
0
18 Dec 2024
Bridging the Divide: Reconsidering Softmax and Linear Attention
Bridging the Divide: Reconsidering Softmax and Linear Attention
Dongchen Han
Yifan Pu
Zhuofan Xia
Yizeng Han
Xuran Pan
Xiu Li
Jiwen Lu
Shiji Song
Gao Huang
73
8
0
09 Dec 2024
HyperSeg: Towards Universal Visual Segmentation with Large Language
  Model
HyperSeg: Towards Universal Visual Segmentation with Large Language Model
Cong Wei
Yujie Zhong
Haoxian Tan
Y. Liu
Zheng Zhao
Jie Hu
Yujiu Yang
VOS
MLLM
VLM
LRM
88
1
0
26 Nov 2024
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities
Zheyuan Zhang
Fengyuan Hu
Jayjun Lee
Freda Shi
Parisa Kordjamshidi
Joyce Chai
Ziqiao Ma
56
11
0
22 Oct 2024
Boosting Weakly-Supervised Referring Image Segmentation via Progressive
  Comprehension
Boosting Weakly-Supervised Referring Image Segmentation via Progressive Comprehension
Zaiquan Yang
Yuhao Liu
Jiaying Lin
Gerhard Hancke
Rynson W. H. Lau
31
1
0
02 Oct 2024
Enhancing Explainability in Multimodal Large Language Models Using
  Ontological Context
Enhancing Explainability in Multimodal Large Language Models Using Ontological Context
Jihen Amara
B. König-Ries
Sheeba Samuel
24
1
0
27 Sep 2024
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring
  Expression Segmentation
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
Yi-Chia Chen
Wei-Hua Li
Cheng Sun
Yu-Chiang Frank Wang
Chu-Song Chen
VLM
39
11
0
01 Sep 2024
Depth-Weighted Detection of Behaviours of Risk in People with Dementia using Cameras
Depth-Weighted Detection of Behaviours of Risk in People with Dementia using Cameras
Pratik K. Mishra
Irene Ballester
Andrea Iaboni
B. Ye
Kristine Newman
Alex Mihailidis
Shehroz S. Khan
45
0
0
28 Aug 2024
Image Segmentation in Foundation Model Era: A Survey
Image Segmentation in Foundation Model Era: A Survey
Tianfei Zhou
Fei Zhang
Boyu Chang
Wenguan Wang
Ye Yuan
E. Konukoglu
Daniel Cremers
VLM
42
4
0
23 Aug 2024
Efficient Diffusion Transformer with Step-wise Dynamic Attention
  Mediators
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
Yifan Pu
Zhuofan Xia
Jiayi Guo
Dongchen Han
Qixiu Li
...
Ji Li
Yizeng Han
Shiji Song
Gao Huang
Xiu Li
58
12
0
11 Aug 2024
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring
  Image Segmentation
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Seonghoon Yu
Paul Hongsuck Seo
Jeany Son
DiffM
57
4
0
10 Jul 2024
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Yuxuan Zhang
Tianheng Cheng
Lianghui Zhu
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
VLM
58
25
0
28 Jun 2024
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and
  Understanding
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
Tao Zhang
Xiangtai Li
Hao Fei
Haobo Yuan
Shengqiong Wu
Shunping Ji
Chen Change Loy
Shuicheng Yan
LRM
MLLM
VLM
49
48
0
27 Jun 2024
F-LMM: Grounding Frozen Large Multimodal Models
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu
Sheng Jin
Wenwei Zhang
Lumin Xu
Wentao Liu
Wei Li
Chen Change Loy
MLLM
80
12
0
09 Jun 2024
HDC: Hierarchical Semantic Decoding with Counting Assistance for
  Generalized Referring Expression Segmentation
HDC: Hierarchical Semantic Decoding with Counting Assistance for Generalized Referring Expression Segmentation
Zhuoyan Luo
Yinghao Wu
Yong-Jin Liu
Yicheng Xiao
Xiao-Ping Zhang
Yujiu Yang
35
0
0
24 May 2024
Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation
Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation
Weize Li
Zhicheng Zhao
Haochen Bai
Fei Su
40
0
0
24 May 2024
LaSagnA: Language-based Segmentation Assistant for Complex Queries
LaSagnA: Language-based Segmentation Assistant for Complex Queries
Cong Wei
Haoxian Tan
Yujie Zhong
Yujiu Yang
Lin Ma
43
14
0
12 Apr 2024
PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model
PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model
Zheng-Wei Zhang
Yeyao Ma
Enming Zhang
Xiang Bai
VLM
MLLM
34
30
0
21 Mar 2024
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu
Saining Xie
LRM
60
122
0
21 Dec 2023
Agent Attention: On the Integration of Softmax and Linear Attention
Agent Attention: On the Integration of Softmax and Linear Attention
Dongchen Han
Tianzhu Ye
Yizeng Han
Zhuofan Xia
Siyuan Pan
Pengfei Wan
Shiji Song
Gao Huang
32
74
0
14 Dec 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
208
900
0
27 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
270
4,244
0
30 Jan 2023
Embodied Referring Expression for Manipulation Question Answering in
  Interactive Environment
Embodied Referring Expression for Manipulation Question Answering in Interactive Environment
Qie Sima
Sinan Tan
Huaping Liu
LM&Ro
54
7
0
06 Oct 2022
Learning to Weight Samples for Dynamic Early-exiting Networks
Learning to Weight Samples for Dynamic Early-exiting Networks
Yizeng Han
Yifan Pu
Zihang Lai
Chaofei Wang
S. Song
Junfen Cao
Wenhui Huang
Chao Deng
Gao Huang
59
54
0
17 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
12
Next