Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.06727
Cited By
What do Vision Transformers Learn? A Visual Exploration
13 December 2022
Amin Ghiasi
Hamid Kazemi
Eitan Borgnia
Steven Reich
Manli Shu
Micah Goldblum
A. Wilson
Tom Goldstein
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What do Vision Transformers Learn? A Visual Exploration"
46 / 46 papers shown
Title
Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data
Haozhe Si
Yuxuan Wan
Minh Do
Deepak Vasisht
Han Zhao
Hendrik Hamann
48
0
0
17 Mar 2025
Discovering Influential Neuron Path in Vision Transformers
Yifan Wang
Yifei Liu
Yingdong Shi
Chong Li
Anqi Pang
Sibei Yang
Jingyi Yu
Kan Ren
ViT
69
0
0
12 Mar 2025
Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models
Samuel Stevens
Wei-Lun Chao
T. Berger-Wolf
Yu-Chuan Su
VLM
74
2
0
10 Feb 2025
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
Harrish Thasarathan
Julian Forsyth
Thomas Fel
M. Kowal
Konstantinos G. Derpanis
111
7
0
06 Feb 2025
Incorporating Feature Pyramid Tokenization and Open Vocabulary Semantic Segmentation
J. Zhang
Li Zhang
Shijian Li
VLM
81
0
0
18 Dec 2024
Memory Efficient Matting with Adaptive Token Routing
Yiheng Lin
Yihan Hu
Chenyi Zhang
Ting Liu
Xiaochao Qu
Luoqi Liu
Yao Zhao
Y. X. Wei
68
0
0
14 Dec 2024
Asynchronous Feedback Network for Perceptual Point Cloud Quality Assessment
Yujie Zhang
Qi Yang
Ziyu Shan
Yiling Xu
3DPC
36
0
0
13 Jul 2024
IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection
Mingjin Zhang
Yuchun Wang
Jie-Ru Guo
Yunsong Li
Xinbo Gao
Jing Zhang
VLM
31
20
0
10 Jul 2024
Visualize and Paint GAN Activations
Rudolf Herdt
Peter Maass
GAN
FAtt
19
0
0
24 May 2024
BoQ: A Place is Worth a Bag of Learnable Queries
A. Ali-bey
B. Chaib-draa
Philippe Giguère
46
17
0
12 May 2024
Examining Changes in Internal Representations of Continual Learning Models Through Tensor Decomposition
Nishant Suresh Aswani
Amira Guesmi
Muhammad Abdullah Hanif
Muhammad Shafique
CLL
30
1
0
06 May 2024
Saliency Suppressed, Semantics Surfaced: Visual Transformations in Neural Networks and the Brain
Gustaw Opielka
Jessica Loke
Steven Scholte
21
0
0
29 Apr 2024
EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM
Henry Peng Zou
Gavin Heqing Yu
Ziwei Fan
Dan Bu
Han Liu
Peng Dai
Dongmei Jia
Cornelia Caragea
23
14
0
13 Apr 2024
Dissecting Query-Key Interaction in Vision Transformers
Xu Pan
Aaron Philip
Ziqian Xie
Odelia Schwartz
39
1
0
04 Apr 2024
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
Reza Esfandiarpoor
Cristina Menghini
Stephen H. Bach
CoGe
VLM
37
8
0
25 Mar 2024
LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors
Saksham Suri
Matthew Walmer
Kamal Gupta
Abhinav Shrivastava
41
4
0
21 Mar 2024
What do we learn from inverting CLIP models?
Hamid Kazemi
Atoosa Malemir Chegini
Jonas Geiping
S. Feizi
Tom Goldstein
38
3
0
05 Mar 2024
Feature Accentuation: Revealing 'What' Features Respond to in Natural Images
Christopher Hamblin
Thomas Fel
Srijani Saha
Talia Konkle
George A. Alvarez
FAtt
26
3
0
15 Feb 2024
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
Dilxat Muhtar
Zhenshi Li
Feng-Xue Gu
Xue-liang Zhang
P. Xiao
78
49
0
04 Feb 2024
Understanding Video Transformers via Universal Concept Discovery
M. Kowal
Achal Dave
Rares Ambrus
Adrien Gaidon
Konstantinos G. Derpanis
P. Tokmakov
ViT
37
8
0
19 Jan 2024
Explainable Multi-Camera 3D Object Detection with Transformer-Based Saliency Maps
Till Beemelmanns
Wassim Zahr
Lutz Eckstein
32
0
0
22 Dec 2023
TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training
Yuqi Lin
Minghao Chen
Kaipeng Zhang
Hengjia Li
Mingming Li
Zheng Yang
Dongqin Lv
Binbin Lin
Haifeng Liu
Deng Cai
CLIP
VLM
46
11
0
20 Dec 2023
Multimodal Pretraining of Medical Time Series and Notes
Ryan N. King
Tianbao Yang
Bobak J. Mortazavi
25
12
0
11 Dec 2023
ViT-Lens: Towards Omni-modal Representations
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
21
18
0
27 Nov 2023
Explainability of Vision Transformers: A Comprehensive Review and New Perspectives
Rojina Kashefi
Leili Barekatain
Mohammad Sabokrou
Fatemeh Aghaeipoor
ViT
37
9
0
12 Nov 2023
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
Micah Goldblum
Hossein Souri
Renkun Ni
Manli Shu
Viraj Prabhu
...
Adrien Bardes
Judy Hoffman
Ramalingam Chellappa
Andrew Gordon Wilson
Tom Goldstein
VLM
81
62
0
30 Oct 2023
Analyzing Vision Transformers for Image Classification in Class Embedding Space
Martina G. Vilas
Timothy Schaumlöffel
Gemma Roig
ViT
21
23
0
29 Oct 2023
NeuroInspect: Interpretable Neuron-based Debugging Framework through Class-conditional Visualizations
Yeong-Joon Ju
Ji-Hoon Park
Seong-Whan Lee
AAML
27
0
0
11 Oct 2023
SPADE: Sparsity-Guided Debugging for Deep Neural Networks
Arshia Soltani Moakhar
Eugenia Iofinova
Elias Frantar
Dan Alistarh
40
1
0
06 Oct 2023
Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness
Fran Jelenić
Josip Jukić
Martin Tutek
Mate Puljiz
Jan vSnajder
OODD
26
5
0
04 Oct 2023
Channel Vision Transformers: An Image Is Worth 1 x 16 x 16 Words
Yu Bao
Srinivasan Sivanandan
Theofanis Karaletsos
ViT
24
23
0
28 Sep 2023
Learning to Generate Training Datasets for Robust Semantic Segmentation
Marwane Hariat
Olivier Laurent
Rémi Kazmierczak
Shihao Zhang
Andrei Bursuc
Angela Yao
Gianni Franchi
UQCV
21
2
0
01 Aug 2023
Scale Alone Does not Improve Mechanistic Interpretability in Vision Models
Roland S. Zimmermann
Thomas Klein
Wieland Brendel
29
13
0
11 Jul 2023
Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation
Aditya Sanghi
P. Jayaraman
Arianna Rampini
Joseph Lambourne
Hooman Shayani
Evan Atherton
Saeid Asgari Taghanaki
3DV
32
15
0
08 Jul 2023
Unlocking Feature Visualization for Deeper Networks with MAgnitude Constrained Optimization
Thomas Fel
Thibaut Boissin
Victor Boutin
Agustin Picard
Paul Novello
...
Drew Linsley
Tom Rousseau
Rémi Cadène
Laurent Gardes
Thomas Serre
FAtt
16
18
0
11 Jun 2023
Segment Anything in High Quality
Lei Ke
Mingqiao Ye
Martin Danelljan
Yifan Liu
Yu-Wing Tai
Chi-Keung Tang
F. I. F. Richard Yu
VLM
27
310
0
02 Jun 2023
Contextual Vision Transformers for Robust Representation Learning
Yu Bao
Theofanis Karaletsos
ViT
26
13
0
30 May 2023
Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity
Zijiao Chen
Jiaxin Qing
J. Zhou
DiffM
VGen
37
54
0
19 May 2023
AttentionViz: A Global View of Transformer Attention
Catherine Yeh
Yida Chen
Aoyu Wu
Cynthia Chen
Fernanda Viégas
Martin Wattenberg
ViT
33
52
0
04 May 2023
A Cookbook of Self-Supervised Learning
Randall Balestriero
Mark Ibrahim
Vlad Sobal
Ari S. Morcos
Shashank Shekhar
...
Pierre Fernandez
Amir Bar
Hamed Pirsiavash
Yann LeCun
Micah Goldblum
SyDa
FedML
SSL
44
273
0
24 Apr 2023
Self-attention in Vision Transformers Performs Perceptual Grouping, Not Attention
Paria Mehrani
John K. Tsotsos
25
24
0
02 Mar 2023
Single Cells Are Spatial Tokens: Transformers for Spatial Transcriptomic Data Imputation
Haifang Wen
Wenzhuo Tang
Wei Jin
Jiayuan Ding
Renming Liu
Xinnan Dai
Feng Shi
Lulu Shang
Jiliang Tang
Yuying Xie
27
8
0
06 Feb 2023
Teaching Matters: Investigating the Role of Supervision in Vision Transformers
Matthew Walmer
Saksham Suri
Kamal Gupta
Abhinav Shrivastava
38
33
0
07 Dec 2022
Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations
Amin Ghiasi
Hamid Kazemi
Steven Reich
Chen Zhu
Micah Goldblum
Tom Goldstein
48
15
0
31 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,443
0
11 Nov 2021
Intriguing Properties of Vision Transformers
Muzammal Naseer
Kanchana Ranasinghe
Salman Khan
Munawar Hayat
F. Khan
Ming-Hsuan Yang
ViT
265
621
0
21 May 2021
1