v1v2v3v4 (latest)

Dissecting Query-Key Interaction in Vision Transformers

4 April 2024

Papers citing "Dissecting Query-Key Interaction in Vision Transformers"

24 / 24 papers shown

Title
Towards Interpreting Visual Information Processing in Vision-Language Models Clement Neo Luke Ong Philip Torr Mor Geva David M. Krueger Fazl Barez 127 12 0 09 Oct 2024
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla Tom Lieberum Matthew Rahtz János Kramár Neel Nanda G. Irving Rohin Shah Vladimir Mikulik 94 114 0 18 Jul 2023
Is Anisotropy Inherent to Transformers? Nathan Godey Eric Villemonte de la Clergerie Benoît Sagot 59 3 0 13 Jun 2023
Affinity-based Attention in Self-supervised Transformers Predicts Dynamics of Object Grouping in Humans Hossein Adeli Seoyoung Ahn N. Kriegeskorte G. Zelinsky ViT 60 5 0 01 Jun 2023
AttentionViz: A Global View of Transformer Attention Catherine Yeh Yida Chen Aoyu Wu Cynthia Chen Fernanda Viégas Martin Wattenberg ViT 62 54 0 04 May 2023
DINOv2: Learning Robust Visual Features without Supervision Maxime Oquab Timothée Darcet Théo Moutakanni Huy Q. Vo Marc Szafraniec ... Hervé Jégou Julien Mairal Patrick Labatut Armand Joulin Piotr Bojanowski VLM CLIP SSL 357 3,410 0 14 Apr 2023
Self-attention in Vision Transformers Performs Perceptual Grouping, Not Attention Paria Mehrani John K. Tsotsos 55 24 0 02 Mar 2023
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture Mahmoud Assran Quentin Duval Ishan Misra Piotr Bojanowski Pascal Vincent Michael G. Rabbat Yann LeCun Nicolas Ballas SSL AI4TS MDE 78 352 0 19 Jan 2023
What do Vision Transformers Learn? A Visual Exploration Amin Ghiasi Hamid Kazemi Eitan Borgnia Steven Reich Manli Shu Micah Goldblum A. Wilson Tom Goldstein ViT 76 63 0 13 Dec 2022
Analyzing Transformers in Embedding Space Guy Dar Mor Geva Ankit Gupta Jonathan Berant 58 91 0 06 Sep 2022
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning Weixin Liang Yuhui Zhang Yongchan Kwon Serena Yeung James Zou VLM 113 422 0 03 Mar 2022
Deep ViT Features as Dense Visual Descriptors Shirzad Amir Yossi Gandelsman Shai Bagon Tali Dekel MDE ViT 109 287 0 10 Dec 2021
SimMIM: A Simple Framework for Masked Image Modeling Zhenda Xie Zheng Zhang Yue Cao Yutong Lin Jianmin Bao Zhuliang Yao Qi Dai Han Hu 197 1,353 0 18 Nov 2021
Localizing Objects with Self-Supervised Transformers and no Labels Oriane Siméoni Gilles Puy Huy V. Vo Simon Roburin Spyros Gidaris Andrei Bursuc P. Pérez Renaud Marlet Jean Ponce ViT 230 201 0 29 Sep 2021
IsoScore: Measuring the Uniformity of Embedding Space Utilization William Rudman Nate Gillman T. Rayne Carsten Eickhoff 21 29 0 16 Aug 2021
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations Xiangning Chen Cho-Jui Hsieh Boqing Gong ViT 87 328 0 03 Jun 2021
Emerging Properties in Self-Supervised Vision Transformers Mathilde Caron Hugo Touvron Ishan Misra Hervé Jégou Julien Mairal Piotr Bojanowski Armand Joulin 694 6,079 0 29 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision Alec Radford Jong Wook Kim Chris Hallacy Aditya A. Ramesh Gabriel Goh ... Amanda Askell Pamela Mishkin Jack Clark Gretchen Krueger Ilya Sutskever CLIP VLM 929 29,436 0 26 Feb 2021
Transformer Feed-Forward Layers Are Key-Value Memories Mor Geva R. Schuster Jonathan Berant Omer Levy KELM 158 828 0 29 Dec 2020
Training data-efficient image transformers & distillation through attention Hugo Touvron Matthieu Cord Matthijs Douze Francisco Massa Alexandre Sablayrolles Hervé Jégou ViT 387 6,768 0 23 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 657 41,103 0 22 Oct 2020
Do Saliency Models Detect Odd-One-Out Targets? New Datasets and Evaluations Iuliia Kotseruba C. Wloka Amir Rasouli John K. Tsotsos 64 19 0 13 May 2020
How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings Kawin Ethayarajh 86 872 0 02 Sep 2019
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 1.7K 39,547 0 01 Sep 2014