Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.01597
Cited By
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
4 December 2023
Feng Wang
Jieru Mei
Alan L. Yuille
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference"
50 / 51 papers shown
Title
Register and CLS tokens yield a decoupling of local and global features in large ViTs
Alexander Lappe
M. Giese
24
0
0
09 May 2025
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
Junjie Wang
Bin Chen
Yulin Li
Bin Kang
Yulin Chen
Zhuotao Tian
VLM
38
0
0
07 May 2025
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation
Gabriele Rosi
Fabio Cermelli
VLM
42
0
0
06 May 2025
FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation
Yasser Benigmim
Mohammad Fahes
Tuan-Hung Vu
Andrei Bursuc
Raoul de Charette
VLM
40
0
0
14 Apr 2025
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei
Rama Chellappa
40
0
0
30 Mar 2025
LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
Vladan Stojnić
Yannis Kalantidis
Jirí Matas
Giorgos Tolias
VLM
48
0
0
25 Mar 2025
The Power of One: A Single Example is All it Takes for Segmentation in VLMs
Mir Rayat Imtiaz Hossain
Mennatullah Siam
Leonid Sigal
James J. Little
MLLM
VLM
79
0
0
13 Mar 2025
Is CLIP ideal? No. Can we fix it? Yes!
Raphi Kang
Yue Song
Georgia Gkioxari
Pietro Perona
VLM
61
0
0
10 Mar 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
93
2
0
08 Mar 2025
PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?
Mennatullah Siam
VLM
84
1
0
06 Feb 2025
DynamicEarth: How Far are We from Open-Vocabulary Change Detection?
Kaiyu Li
Xiangyong Cao
Yupeng Deng
Chao Pang
Zepeng Xin
Deyu Meng
Zhi Wang
ObjD
69
1
0
22 Jan 2025
TeD-Loc: Text Distillation for Weakly Supervised Object Localization
Shakeeb Murtaza
Soufiane Belharbi
M. Pedersoli
Eric Granger
WSOL
VLM
99
1
0
22 Jan 2025
Few-Shot Adaptation of Training-Free Foundation Model for 3D Medical Image Segmentation
Xingxin He
Yifan Hu
Zhaoye Zhou
Mohamed Jarraya
Fang Liu
VLM
MedIm
45
2
0
17 Jan 2025
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Cijo Jose
Théo Moutakanni
Dahyun Kang
Federico Baldassarre
Timothée Darcet
...
Maxime Oquab
Oriane Siméoni
Huy V. Vo
Patrick Labatut
Piotr Bojanowski
CLIP
VLM
100
6
0
20 Dec 2024
FLAIR: VLM with Fine-grained Language-informed Image Representations
Rui Xiao
Sanghwan Kim
Mariana-Iuliana Georgescu
Zeynep Akata
Stephan Alaniz
VLM
CLIP
79
2
0
04 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
85
2
0
02 Dec 2024
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
Luca Barsellotti
Lorenzo Bianchi
Nicola Messina
F. Carrara
Marcella Cornia
Lorenzo Baraldi
Fabrizio Falchi
Rita Cucchiara
VLM
72
2
0
28 Nov 2024
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
Chanyoung Kim
Dayun Ju
Woojung Han
Ming-Hsuan Yang
Seong Jae Hwang
VLM
VOS
79
0
0
26 Nov 2024
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
Yuhang Yang
Jinhong Deng
Wen Li
Lixin Duan
VLM
81
0
0
24 Nov 2024
Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
Sule Bai
Yong-Jin Liu
Yifei Han
Haoji Zhang
Yansong Tang
VLM
79
3
0
24 Nov 2024
freePruner: A Training-free Approach for Large Multimodal Model Acceleration
Bingxin Xu
Yuzhang Shang
Yunhao Ge
Qian Lou
Yan Yan
97
3
0
23 Nov 2024
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
M. Arda Aydın
Efe Mert Çırpar
Elvin Abdinli
Gözde B. Ünal
Y. Sahin
VLM
71
0
0
18 Nov 2024
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
49
1
0
15 Nov 2024
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation
Yuheng Shi
Minjing Dong
Chang Xu
VLM
43
1
0
14 Nov 2024
Multilingual Vision-Language Pre-training for the Remote Sensing Domain
João Daniel Silva
João Magalhães
D. Tuia
Bruno Martins
CLIP
VLM
42
1
0
30 Oct 2024
Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers
Andrew F. Luo
Jacob Yeung
Rushikesh Zawar
Shaurya Dewan
Margaret M. Henderson
Leila Wehbe
Michael J. Tarr
34
3
0
07 Oct 2024
SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images
Kaiyu Li
Ruixun Liu
Xiangyong Cao
Deyu Meng
Zhi Wang
Deyu Meng
Zhi Wang
36
3
0
02 Oct 2024
EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models
Jiacheng Zhang
Yang Jiao
Shaoxiang Chen
Jingjing Chen
Yu-Gang Jiang
28
1
0
25 Sep 2024
Image Segmentation in Foundation Model Era: A Survey
Tianfei Zhou
Fei Zhang
Boyu Chang
Wenguan Wang
Ye Yuan
E. Konukoglu
Daniel Cremers
VLM
42
4
0
23 Aug 2024
ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation
Jingyun Wang
Guoliang Kang
VLM
SSL
47
7
0
13 Aug 2024
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
Dahyun Kang
Minsu Cho
ObjD
VLM
40
9
0
09 Aug 2024
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
Mengcheng Lan
Chaofeng Chen
Yiping Ke
Xinjiang Wang
Xue Jiang
Wayne Zhang
VLM
42
24
0
17 Jul 2024
Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
Tong Shao
Zhuotao Tian
Hang Zhao
Jingyong Su
VLM
39
15
0
11 Jul 2024
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images
Rushikesh Zawar
Shaurya Dewan
Andrew F. Luo
Margaret M. Henderson
Michael J. Tarr
Leila Wehbe
VGen
CoGe
44
1
0
19 Jun 2024
Tuning-free Universally-Supervised Semantic Segmentation
Xiaobo Yang
Xiaojin Gong
VLM
50
1
0
23 May 2024
Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation
Sina Hajimiri
Ismail Ben Ayed
Jose Dolz
VLM
41
22
0
12 Apr 2024
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Brandon McKinzie
Zhe Gan
J. Fauconnier
Sam Dodge
Bowen Zhang
...
Zirui Wang
Ruoming Pang
Peter Grasch
Alexander Toshev
Yinfei Yang
MLLM
43
187
0
14 Mar 2024
Annotation Free Semantic Segmentation with Vision Foundation Models
Soroush Seifi
Daniel Olmeda Reino
Fabien Despinoy
Rahaf Aljundi
VLM
34
1
0
14 Mar 2024
Gramformer: Learning Crowd Counting via Graph-Modulated Transformer
Hui Lin
Zhiheng Ma
Xiaopeng Hong
Qinnan Shangguan
Deyu Meng
ViT
47
17
0
08 Jan 2024
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Walid Bousselham
Felix Petersen
Vittorio Ferrari
Hilde Kuehne
ObjD
VLM
42
39
0
01 Dec 2023
IFSeg: Image-free Semantic Segmentation via Vision-Language Model
Sukmin Yun
S. Park
Paul Hongsuck Seo
Jinwoo Shin
VLM
MLLM
57
14
0
25 Mar 2023
BiFormer: Vision Transformer with Bi-Level Routing Attention
Lei Zhu
Xinjiang Wang
Zhanghan Ke
Wayne Zhang
Rynson W. H. Lau
131
480
0
15 Mar 2023
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Jiarui Xu
Sifei Liu
Arash Vahdat
Wonmin Byeon
Xiaolong Wang
Shalini De Mello
VLM
223
320
0
08 Mar 2023
GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
Xinyu Wang
ViT
VLM
189
499
0
22 Feb 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,443
0
11 Nov 2021
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
345
2,271
0
02 Sep 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
199
405
0
13 Jul 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
317
5,785
0
29 Apr 2021
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Nayeon Lee
Weicheng Kuo
Huayu Chen
VLM
ObjD
225
899
0
28 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
304
3,708
0
11 Feb 2021
1
2
Next