An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models

26 November 2024

Papers citing "An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models"

26 / 26 papers shown

Title
Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization Yunzhe Hu Difan Zou Dong Xu 141 1 0 17 Feb 2025
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models Asma Ghandeharioun Avi Caciularu Adam Pearce Lucas Dixon Mor Geva 124 114 0 11 Jan 2024
White-Box Transformers via Sparse Rate Reduction Yaodong Yu Sam Buchanan Druv Pai Tianzhe Chu Ziyang Wu Shengbang Tong B. Haeffele Yi Ma ViT 94 87 0 01 Jun 2023
Birth of a Transformer: A Memory Viewpoint A. Bietti Vivien A. Cabannes Diane Bouchacourt Hervé Jégou Léon Bottou 104 96 0 01 Jun 2023
The emergence of clusters in self-attention dynamics Borjan Geshkovski Cyril Letrouit Yury Polyanskiy Philippe Rigollet 93 56 0 09 May 2023
Energy Transformer Benjamin Hoover Yuchen Liang Bao Pham Yikang Shen Hendrik Strobelt Duen Horng Chau Mohammed J Zaki Dmitry Krotov ViT 81 49 0 14 Feb 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 316 563 0 01 Nov 2022
Omnigrok: Grokking Beyond Algorithmic Data Ziming Liu Eric J. Michaud Max Tegmark 93 84 0 03 Oct 2022
Analyzing Transformers in Embedding Space Guy Dar Mor Geva Ankit Gupta Jonathan Berant 83 93 0 06 Sep 2022
Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers Arda Sahiner Tolga Ergen Batu Mehmet Ozturkler John M. Pauly Morteza Mardani Mert Pilanci 90 33 0 17 May 2022
Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models Beren Millidge Tommaso Salvatori Yuhang Song Thomas Lukasiewicz Rafal Bogacz VLM 53 54 0 09 Feb 2022
Attention Approximates Sparse Distributed Memory Trenton Bricken Cengiz Pehlevan 82 34 0 10 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers Mathilde Caron Hugo Touvron Ishan Misra Hervé Jégou Julien Mairal Piotr Bojanowski Armand Joulin 735 6,139 0 29 Apr 2021
Graph Neural Networks Inspired by Classical Iterative Algorithms Yongyi Yang T. Liu Yangkun Wang Jinjing Zhou Quan Gan Zhewei Wei Zheng Zhang Zengfeng Huang David Wipf 99 83 0 10 Mar 2021
Transformer Interpretability Beyond Attention Visualization Hila Chefer Shir Gur Lior Wolf 139 673 0 17 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 684 41,563 0 22 Oct 2020
Hopfield Networks is All You Need Hubert Ramsauer Bernhard Schafl Johannes Lehner Philipp Seidl Michael Widrich ... David P. Kreil Michael K Kopp Günter Klambauer Johannes Brandstetter Sepp Hochreiter 128 437 0 16 Jul 2020
Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing V. Monga Yuelong Li Yonina C. Eldar 105 1,022 0 22 Dec 2019
Fantastic Generalization Measures and Where to Find Them Yiding Jiang Behnam Neyshabur H. Mobahi Dilip Krishnan Samy Bengio AI4CE 145 611 0 04 Dec 2019
RandAugment: Practical automated data augmentation with a reduced search space E. D. Cubuk Barret Zoph Jonathon Shlens Quoc V. Le MQ 270 3,508 0 30 Sep 2019
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks Behnam Neyshabur Srinadh Bhojanapalli Nathan Srebro 92 610 0 29 Jul 2017
Spectrally-normalized margin bounds for neural networks Peter L. Bartlett Dylan J. Foster Matus Telgarsky ODL 218 1,225 0 26 Jun 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 440 2,946 0 15 Sep 2016
Layer Normalization Jimmy Lei Ba J. Kiros Geoffrey E. Hinton 435 10,541 0 21 Jul 2016
Norm-Based Capacity Control in Neural Networks Behnam Neyshabur Ryota Tomioka Nathan Srebro 292 591 0 27 Feb 2015
Invariant Scattering Convolution Networks Joan Bruna S. Mallat 136 1,279 0 05 Mar 2012