Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.03525
Cited By
FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation
5 September 2024
Xi Chen
Haosen Yang
Sheng Jin
Xiatian Zhu
Huanjin Yao
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (24★)
Papers citing
"FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation"
30 / 30 papers shown
Title
Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections
Max Kirchner
Alexander C. Jenke
S. Bodenstedt
Fiona Kolbinger
Oliver Saldanha
Jakob N. Kather
M. Wagner
Stefanie Speidel
FedML
MedIm
134
1
0
23 Apr 2025
Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects
Jian Hu
Jiayi Lin
Weitong Cai
Shaogang Gong
VLM
54
28
0
12 Dec 2023
OpenSD: Unified Open-Vocabulary Segmentation and Detection
Shuai Li
Ming-hui Li
Pengfei Wang
Lei Zhang
ObjD
VLM
68
6
0
10 Dec 2023
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Qihang Yu
Ju He
XueQing Deng
Xiaohui Shen
Liang-Chieh Chen
VLM
CLIP
77
150
0
04 Aug 2023
Segment Anything in High Quality
Lei Ke
Mingqiao Ye
Martin Danelljan
Yifan Liu
Yu-Wing Tai
Chi-Keung Tang
Feng Yu
VLM
107
337
0
02 Jun 2023
FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
Jie Qin
Jie Wu
Pengxiang Yan
Ming Li
Ren Yuxi
...
Yitong Wang
Rui Wang
Shilei Wen
X. Pan
Xingang Wang
SSeg
VLM
80
94
0
30 Mar 2023
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
Feng Liang
Bichen Wu
Xiaoliang Dai
Kunpeng Li
Yinan Zhao
Hang Zhang
Peizhao Zhang
Peter Vajda
Diana Marculescu
CLIP
VLM
102
457
0
09 Oct 2022
F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Weicheng Kuo
Huayu Chen
Xiuye Gu
A. Piergiovanni
A. Angelova
MLLM
VLM
ObjD
136
137
0
30 Sep 2022
Open-Vocabulary Universal Image Segmentation with MaskCLIP
Zheng Ding
Jieke Wang
Zhuowen Tu
CLIP
ISeg
VLM
91
90
0
18 Aug 2022
Vision Transformer Adapter for Dense Predictions
Zhe Chen
Yuchen Duan
Wenhai Wang
Junjun He
Tong Lu
Jifeng Dai
Yu Qiao
136
567
0
17 May 2022
DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR
Shilong Liu
Feng Li
Hao Zhang
Xiaohu Yang
Xianbiao Qi
Hang Su
Jun Zhu
Lei Zhang
ViT
294
761
0
28 Jan 2022
A ConvNet for the 2020s
Zhuang Liu
Hanzi Mao
Chaozheng Wu
Christoph Feichtenhofer
Trevor Darrell
Saining Xie
ViT
176
5,213
0
10 Jan 2022
Detecting Twenty-thousand Classes using Image-level Supervision
Xingyi Zhou
Rohit Girdhar
Armand Joulin
Phillip Krahenbuhl
Ishan Misra
CLIP
VLM
106
617
0
07 Jan 2022
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Lei Li
Kai-Wei Chang
Jianfeng Gao
ObjD
VLM
129
1,066
0
07 Dec 2021
Masked-attention Mask Transformer for Universal Image Segmentation
Bowen Cheng
Ishan Misra
Alex Schwing
Alexander Kirillov
Rohit Girdhar
ISeg
253
2,379
0
02 Dec 2021
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
Renrui Zhang
Rongyao Fang
Wei Zhang
Peng Gao
Kunchang Li
Jifeng Dai
Yu Qiao
Hongsheng Li
VLM
265
401
0
06 Nov 2021
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
502
2,409
0
02 Sep 2021
Conditional DETR for Fast Training Convergence
Depu Meng
Xiaokang Chen
Zejia Fan
Gang Zeng
Houqiang Li
Yuhui Yuan
Lei-huan Sun
Jingdong Wang
ViT
88
619
0
13 Aug 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
967
29,810
0
26 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
453
3,887
0
11 Feb 2021
SwiftNet: Real-time Video Object Segmentation
Haochen Wang
Xiaolong Jiang
Haibing Ren
Yao Hu
S. Bai
VOS
72
150
0
09 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
670
41,369
0
22 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
234
5,091
0
08 Oct 2020
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
432
13,094
0
26 May 2020
Context Prior for Scene Segmentation
Changqian Yu
Jingbo Wang
Changxin Gao
Gang Yu
Chunhua Shen
Nong Sang
120
269
0
03 Apr 2020
LVIS: A Dataset for Large Vocabulary Instance Segmentation
Agrim Gupta
Piotr Dollár
Ross B. Girshick
ISeg
VLM
105
1,376
0
08 Aug 2019
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
404
1,889
0
18 Aug 2016
The Cityscapes Dataset for Semantic Urban Scene Understanding
Marius Cordts
Mohamed Omran
Sebastian Ramos
Timo Rehfeld
Markus Enzweiler
Rodrigo Benenson
Uwe Franke
Stefan Roth
Bernt Schiele
1.1K
11,641
0
06 Apr 2016
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
218
2,493
0
01 Apr 2015
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
424
43,814
0
01 May 2014
1