Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.17182
Cited By
An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models
26 November 2024
Yunzhe Hu
Difan Zou
Dong Xu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models"
26 / 26 papers shown
Title
Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization
Yunzhe Hu
Difan Zou
Dong Xu
141
1
0
17 Feb 2025
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun
Avi Caciularu
Adam Pearce
Lucas Dixon
Mor Geva
124
114
0
11 Jan 2024
White-Box Transformers via Sparse Rate Reduction
Yaodong Yu
Sam Buchanan
Druv Pai
Tianzhe Chu
Ziyang Wu
Shengbang Tong
B. Haeffele
Yi Ma
ViT
94
87
0
01 Jun 2023
Birth of a Transformer: A Memory Viewpoint
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
104
96
0
01 Jun 2023
The emergence of clusters in self-attention dynamics
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
93
56
0
09 May 2023
Energy Transformer
Benjamin Hoover
Yuchen Liang
Bao Pham
Yikang Shen
Hendrik Strobelt
Duen Horng Chau
Mohammed J Zaki
Dmitry Krotov
ViT
81
49
0
14 Feb 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
316
563
0
01 Nov 2022
Omnigrok: Grokking Beyond Algorithmic Data
Ziming Liu
Eric J. Michaud
Max Tegmark
93
84
0
03 Oct 2022
Analyzing Transformers in Embedding Space
Guy Dar
Mor Geva
Ankit Gupta
Jonathan Berant
83
93
0
06 Sep 2022
Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers
Arda Sahiner
Tolga Ergen
Batu Mehmet Ozturkler
John M. Pauly
Morteza Mardani
Mert Pilanci
90
33
0
17 May 2022
Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models
Beren Millidge
Tommaso Salvatori
Yuhang Song
Thomas Lukasiewicz
Rafal Bogacz
VLM
53
54
0
09 Feb 2022
Attention Approximates Sparse Distributed Memory
Trenton Bricken
Cengiz Pehlevan
82
34
0
10 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
735
6,139
0
29 Apr 2021
Graph Neural Networks Inspired by Classical Iterative Algorithms
Yongyi Yang
T. Liu
Yangkun Wang
Jinjing Zhou
Quan Gan
Zhewei Wei
Zheng Zhang
Zengfeng Huang
David Wipf
99
83
0
10 Mar 2021
Transformer Interpretability Beyond Attention Visualization
Hila Chefer
Shir Gur
Lior Wolf
139
673
0
17 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
684
41,563
0
22 Oct 2020
Hopfield Networks is All You Need
Hubert Ramsauer
Bernhard Schafl
Johannes Lehner
Philipp Seidl
Michael Widrich
...
David P. Kreil
Michael K Kopp
Günter Klambauer
Johannes Brandstetter
Sepp Hochreiter
128
437
0
16 Jul 2020
Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing
V. Monga
Yuelong Li
Yonina C. Eldar
105
1,022
0
22 Dec 2019
Fantastic Generalization Measures and Where to Find Them
Yiding Jiang
Behnam Neyshabur
H. Mobahi
Dilip Krishnan
Samy Bengio
AI4CE
145
611
0
04 Dec 2019
RandAugment: Practical automated data augmentation with a reduced search space
E. D. Cubuk
Barret Zoph
Jonathon Shlens
Quoc V. Le
MQ
270
3,508
0
30 Sep 2019
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
Behnam Neyshabur
Srinadh Bhojanapalli
Nathan Srebro
92
610
0
29 Jul 2017
Spectrally-normalized margin bounds for neural networks
Peter L. Bartlett
Dylan J. Foster
Matus Telgarsky
ODL
218
1,225
0
26 Jun 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
440
2,946
0
15 Sep 2016
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
435
10,541
0
21 Jul 2016
Norm-Based Capacity Control in Neural Networks
Behnam Neyshabur
Ryota Tomioka
Nathan Srebro
292
591
0
27 Feb 2015
Invariant Scattering Convolution Networks
Joan Bruna
S. Mallat
136
1,279
0
05 Mar 2012
1