Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.00968
Cited By
GRES: Generalized Referring Expression Segmentation
1 June 2023
Chang Liu
Henghui Ding
Xudong Jiang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"GRES: Generalized Referring Expression Segmentation"
50 / 128 papers shown
Title
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement
Yuqi Liu
Bohao Peng
Zhisheng Zhong
Zihao Yue
Fanbin Lu
Bei Yu
Jiaya Jia
LRM
VLM
123
46
0
01 Jul 2025
Multi-encoder nnU-Net outperforms transformer models with self-supervised pretraining
Seyedeh Sahar Taheri Otaghsara
Reza Rahmanzadeh
ViT
73
0
0
01 Jul 2025
Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
Shiu-hong Kao
Yu-Wing Tai
Chi-Keung Tang
MLLM
LRM
283
1
0
01 Jul 2025
FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation
Fan Yang
Yousong Zhu
Xin Li
Yufei Zhan
Hongyin Zhao
Shurong Zheng
Yaowei Wang
Ming Tang
Jinqiao Wang
MLLM
VLM
50
0
0
20 Jun 2025
MBA: Multimodal Bidirectional Attack for Referring Expression Segmentation Models
Xingbai Chen
Tingchao Fu
Renyang Liu
Wei Zhou
Chao Yi
AAML
31
0
0
19 Jun 2025
Refer to Anything with Vision-Language Prompts
Shengcao Cao
Zijun Wei
Jason Kuen
Kangning Liu
Lingzhi Zhang
Jiuxiang Gu
HyunJoon Jung
Liang-Yan Gui
Yu Wang
VLM
117
0
0
05 Jun 2025
A Large-Scale Referring Remote Sensing Image Segmentation Dataset and Benchmark
Zhigang Yang
Huiguang Yao
Linmao Tian
Xuezhi Zhao
Qiang Li
Qi. Wang
98
0
0
04 Jun 2025
R2SM: Referring and Reasoning for Selective Masks
Yu-Lin Shih
Wei-En Tai
Cheng Sun
Y. Wang
Hwann-Tzong Chen
ISeg
81
0
0
02 Jun 2025
SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes
Yuji Wang
Haoran Xu
Yong-Jin Liu
Jiaze Li
Yansong Tang
101
1
0
02 Jun 2025
PixelThink: Towards Efficient Chain-of-Pixel Reasoning
Song Wang
Gongfan Fang
Lingdong Kong
Xiangtai Li
Jianyun Xu
Sheng Yang
Qiang Li
Jianke Zhu
Xinchao Wang
LRM
121
0
0
29 May 2025
SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning
Jiaqi Huang
Zunnan Xu
Jun Zhou
Ting Liu
Yicheng Xiao
Mingwen Ou
Bowen Ji
Xiu Li
Kehong Yuan
VLM
91
0
0
28 May 2025
LlamaSeg: Image Segmentation via Autoregressive Mask Generation
Jiru Deng
Tengjin Weng
Tianyu Yang
Wenhan Luo
Zhiheng Li
Wenhao Jiang
VLM
149
0
0
26 May 2025
Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model
Alaa Dalaq
Muzammil Behzad
VLM
198
0
0
25 May 2025
SynRES: Towards Referring Expression Segmentation in the Wild via Synthetic Data
Dong-Hee Kim
Hyunjee Song
Donghyun Kim
290
0
0
23 May 2025
RemoteSAM: Towards Segment Anything for Earth Observation
Liang Yao
Fan Liu
Delong Chen
Chuanyi Zhang
Yijun Wang
Ziyun Chen
Wei Xu
Shimin Di
Yuhui Zheng
236
0
0
23 May 2025
Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels
Yongshuo Zong
Qin Zhang
Dongsheng An
Zhihua Li
Xiang Xu
Linghan Xu
Zhuowen Tu
Yifan Xing
Onkar Dabeer
ObjD
96
0
0
20 May 2025
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
Yuqi Liu
Tianyuan Qu
Zhisheng Zhong
Bohao Peng
Shu Liu
Bei Yu
Jiaya Jia
VLM
LRM
132
3
0
17 May 2025
Adversarial Robustness Analysis of Vision-Language Models in Medical Image Segmentation
Anjila Budathoki
Manish Dhakal
AAML
109
1
0
05 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
109
1
0
03 May 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
243
0
0
22 Apr 2025
RefComp: A Reference-guided Unified Framework for Unpaired Point Cloud Completion
Yixuan Yang
Jinyu Yang
Zixiang Zhao
Victor Sanchez
Feng Zheng
74
0
0
18 Apr 2025
Learning What NOT to Count
Adriano DÁlessandro
Ali Mahdavi-Amiri
Ghassan Hamarneh
82
0
0
16 Apr 2025
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Ziqi Pang
Xin Xu
Yu-Xiong Wang
DiffM
199
0
0
15 Apr 2025
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
Henghui Ding
Chang Liu
Nikhila Ravi
Shuting He
Y. Wei
...
Haobo Yuan
Xuelong Li
Tao Zhang
Lu Qi
Ming-Hsuan Yang
92
1
0
15 Apr 2025
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Tao Zhang
Xuelong Li
Zilong Huang
Yuchen Li
Weixian Lei
XueQing Deng
Shihao Chen
S. Ji
Jiashi Feng
MLLM
LRM
143
5
0
14 Apr 2025
SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model
Kaiyu Li
Zepeng Xin
Li Pang
Chao Pang
Yupeng Deng
Jing Yao
Guisong Xia
Deyu Meng
Zhi Wang
Xiangyong Cao
VLM
LRM
105
4
0
13 Apr 2025
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
Yuejiao Su
Yi Wang
Qiongyang Hu
Chuang Yang
Lap-Pui Chau
102
0
0
02 Apr 2025
Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities
Jing Liu
Wenxuan Wang
Yisi Zhang
Yepeng Tang
Xingjian He
Longteng Guo
Tongtian Yue
Xinlong Wang
ObjD
104
1
0
02 Apr 2025
POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation
Lanyun Zhu
Tianrun Chen
Qianxiong Xu
Xuanyi Liu
Deyi Ji
Haiyang Wu
De Wen Soh
Jing Liu
VLM
LRM
86
1
0
01 Apr 2025
CADFormer: Fine-Grained Cross-modal Alignment and Decoding Transformer for Referring Remote Sensing Image Segmentation
Maofu Liu
Xin Jiang
Xiaokang Zhang
101
0
0
30 Mar 2025
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
Zhiqiang Zhang
Jia-Nan Li
Zunnan Xu
Hanhui Li
Yiji Cheng
Fa-Ting Hong
Qin Lin
Qinglin Lu
Xiaodan Liang
DiffM
140
2
0
25 Mar 2025
Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding
Hao Guo
Jianfei Zhu
Wei Fan
Chunzhi Yi
Feng Jiang
ObjD
90
0
0
25 Mar 2025
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation
Donggon Jang
Yucheol Cho
Suin Lee
Taehyeon Kim
Dae-Shik Kim
VLM
93
3
0
18 Mar 2025
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
Tao Wang
Changxu Cheng
Lingfeng Wang
Senda Chen
Wuyue Zhao
VLM
100
1
0
17 Mar 2025
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
R. Hu
Lianghui Zhu
Yuxuan Zhang
Tianheng Cheng
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
ObjD
159
0
0
13 Mar 2025
REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding
Yan Tai
Luhao Zhu
Zhiqiang Chen
Ynan Ding
Yiying Dong
Xiaohong Liu
Guodong Guo
MLLM
ObjD
97
0
0
10 Mar 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
127
5
0
08 Mar 2025
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Hao Tang
Chenwei Xie
Haiyang Wang
Xiaoyi Bao
Tingyu Weng
Pandeng Li
Yun Zheng
Liwei Wang
ObjD
VLM
134
1
0
03 Mar 2025
AeroReformer: Aerial Referring Transformer for UAV-based Referring Image Segmentation
Rui Li
Xiaowei Zhao
138
0
0
23 Feb 2025
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding
Xin Gu
Yaojie Shen
Chenxi Luo
Tiejian Luo
Yan Huang
Yuewei Lin
Heng Fan
L. Zhang
108
2
0
16 Feb 2025
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Sitong Gong
Yunzhi Zhuge
Lu Zhang
Zhiyong Yang
Pingping Zhang
Huchuan Lu
95
3
0
15 Jan 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Haobo Yuan
Xianrui Li
Tao Zhang
Zilong Huang
Shilin Xu
S. Ji
Yunhai Tong
Lu Qi
Jiashi Feng
Ming-Hsuan Yang
VLM
195
25
0
07 Jan 2025
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Hao Fei
Shengqiong Wu
Hao Zhang
Tat-Seng Chua
Shuicheng Yan
190
42
0
31 Dec 2024
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
284
5
0
31 Dec 2024
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models
Cong Wei
Yujie Zhong
Haoxian Tan
Yingsen Zeng
Yong Liu
Zheng Zhao
Yujiu Yang
MLLM
VLM
VOS
152
3
0
18 Dec 2024
Grasp What You Want: Embodied Dexterous Grasping System Driven by Your Voice
Junliang Li
Kai Ye
Haolan Kang
Mingxuan Liang
Yuhang Wu
Zhenhua Liu
Huiping Zhuang
Rui Huang
Yongquan Chen
138
0
0
14 Dec 2024
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
Chunlin Yu
Hanqing Wang
Ye Shi
Haoyang Luo
Sibei Yang
Jingyi Yu
Jingya Wang
LRM
LM&Ro
213
3
0
02 Dec 2024
HyperSeg: Towards Universal Visual Segmentation with Large Language Model
Cong Wei
Yujie Zhong
Haoxian Tan
Yong Liu
Zheng Zhao
Jie Hu
Yujiu Yang
VOS
MLLM
VLM
LRM
136
6
0
26 Nov 2024
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
Seongsu Ha
Chaeyun Kim
Donghwa Kim
Junho Lee
Sangho Lee
Joonseok Lee
117
4
0
03 Nov 2024
Referring Human Pose and Mask Estimation in the Wild
Bo Miao
Mingtao Feng
Zijie Wu
Mohammed Bennamoun
Yongsheng Gao
Ajmal Mian
86
0
0
27 Oct 2024
1
2
3
Next