ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.17182
  4. Cited By
An In-depth Investigation of Sparse Rate Reduction in Transformer-like
  Models

An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models

26 November 2024
Yunzhe Hu
Difan Zou
Dong Xu
ArXiv (abs)PDFHTML

Papers citing "An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models"

26 / 26 papers shown
Title
Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization
Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization
Yunzhe Hu
Difan Zou
Dong Xu
141
1
0
17 Feb 2025
Patchscopes: A Unifying Framework for Inspecting Hidden Representations
  of Language Models
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun
Avi Caciularu
Adam Pearce
Lucas Dixon
Mor Geva
124
114
0
11 Jan 2024
White-Box Transformers via Sparse Rate Reduction
White-Box Transformers via Sparse Rate Reduction
Yaodong Yu
Sam Buchanan
Druv Pai
Tianzhe Chu
Ziyang Wu
Shengbang Tong
B. Haeffele
Yi Ma
ViT
94
87
0
01 Jun 2023
Birth of a Transformer: A Memory Viewpoint
Birth of a Transformer: A Memory Viewpoint
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
104
96
0
01 Jun 2023
The emergence of clusters in self-attention dynamics
The emergence of clusters in self-attention dynamics
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
93
56
0
09 May 2023
Energy Transformer
Energy Transformer
Benjamin Hoover
Yuchen Liang
Bao Pham
Yikang Shen
Hendrik Strobelt
Duen Horng Chau
Mohammed J Zaki
Dmitry Krotov
ViT
81
49
0
14 Feb 2023
Interpretability in the Wild: a Circuit for Indirect Object
  Identification in GPT-2 small
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
316
563
0
01 Nov 2022
Omnigrok: Grokking Beyond Algorithmic Data
Omnigrok: Grokking Beyond Algorithmic Data
Ziming Liu
Eric J. Michaud
Max Tegmark
93
84
0
03 Oct 2022
Analyzing Transformers in Embedding Space
Analyzing Transformers in Embedding Space
Guy Dar
Mor Geva
Ankit Gupta
Jonathan Berant
83
93
0
06 Sep 2022
Unraveling Attention via Convex Duality: Analysis and Interpretations of
  Vision Transformers
Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers
Arda Sahiner
Tolga Ergen
Batu Mehmet Ozturkler
John M. Pauly
Morteza Mardani
Mert Pilanci
90
33
0
17 May 2022
Universal Hopfield Networks: A General Framework for Single-Shot
  Associative Memory Models
Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models
Beren Millidge
Tommaso Salvatori
Yuhang Song
Thomas Lukasiewicz
Rafal Bogacz
VLM
53
54
0
09 Feb 2022
Attention Approximates Sparse Distributed Memory
Attention Approximates Sparse Distributed Memory
Trenton Bricken
Cengiz Pehlevan
82
34
0
10 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
735
6,139
0
29 Apr 2021
Graph Neural Networks Inspired by Classical Iterative Algorithms
Graph Neural Networks Inspired by Classical Iterative Algorithms
Yongyi Yang
T. Liu
Yangkun Wang
Jinjing Zhou
Quan Gan
Zhewei Wei
Zheng Zhang
Zengfeng Huang
David Wipf
99
83
0
10 Mar 2021
Transformer Interpretability Beyond Attention Visualization
Transformer Interpretability Beyond Attention Visualization
Hila Chefer
Shir Gur
Lior Wolf
139
673
0
17 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
684
41,563
0
22 Oct 2020
Hopfield Networks is All You Need
Hopfield Networks is All You Need
Hubert Ramsauer
Bernhard Schafl
Johannes Lehner
Philipp Seidl
Michael Widrich
...
David P. Kreil
Michael K Kopp
Günter Klambauer
Johannes Brandstetter
Sepp Hochreiter
128
437
0
16 Jul 2020
Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal
  and Image Processing
Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing
V. Monga
Yuelong Li
Yonina C. Eldar
105
1,022
0
22 Dec 2019
Fantastic Generalization Measures and Where to Find Them
Fantastic Generalization Measures and Where to Find Them
Yiding Jiang
Behnam Neyshabur
H. Mobahi
Dilip Krishnan
Samy Bengio
AI4CE
145
611
0
04 Dec 2019
RandAugment: Practical automated data augmentation with a reduced search
  space
RandAugment: Practical automated data augmentation with a reduced search space
E. D. Cubuk
Barret Zoph
Jonathon Shlens
Quoc V. Le
MQ
270
3,508
0
30 Sep 2019
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for
  Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
Behnam Neyshabur
Srinadh Bhojanapalli
Nathan Srebro
92
610
0
29 Jul 2017
Spectrally-normalized margin bounds for neural networks
Spectrally-normalized margin bounds for neural networks
Peter L. Bartlett
Dylan J. Foster
Matus Telgarsky
ODL
218
1,225
0
26 Jun 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
440
2,946
0
15 Sep 2016
Layer Normalization
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
435
10,541
0
21 Jul 2016
Norm-Based Capacity Control in Neural Networks
Norm-Based Capacity Control in Neural Networks
Behnam Neyshabur
Ryota Tomioka
Nathan Srebro
292
591
0
27 Feb 2015
Invariant Scattering Convolution Networks
Invariant Scattering Convolution Networks
Joan Bruna
S. Mallat
136
1,279
0
05 Mar 2012
1