Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.09637
Cited By
v1
v2
v3 (latest)
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models
12 October 2024
N. Jha
Brandon Reagen
OffRL
AI4CE
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models"
14 / 14 papers shown
Title
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
79
0
0
06 Mar 2025
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Michael Zhang
Kush S. Bhatia
Hermann Kumbong
Christopher Ré
66
54
0
06 Feb 2024
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning
Bingchen Zhao
Haoqin Tu
Chen Wei
Jieru Mei
Cihang Xie
82
36
0
18 Dec 2023
Linear Log-Normal Attention with Unbiased Concentration
Yury Nahshan
Dor-Joseph Kampeas
E. Haleva
59
8
0
22 Nov 2023
Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption
Itamar Zimerman
Moran Baruch
Nir Drucker
Gilad Ezov
Omri Soceanu
Lior Wolf
80
17
0
15 Nov 2023
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit
Lorenzo Noci
Chuning Li
Mufan Li
Bobby He
Thomas Hofmann
Chris J. Maddison
Daniel M. Roy
90
36
0
30 Jun 2023
DeepReShape: Redesigning Neural Networks for Efficient Private Inference
N. Jha
Brandon Reagen
67
10
0
20 Apr 2023
Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation
Bobby He
James Martens
Guodong Zhang
Aleksandar Botev
Andy Brock
Samuel L. Smith
Yee Whye Teh
79
30
0
20 Feb 2023
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
Xiuying Wei
Yunchen Zhang
Xiangguo Zhang
Ruihao Gong
Shanghang Zhang
Qi Zhang
F. Yu
Xianglong Liu
MQ
100
152
0
27 Sep 2022
Block-Recurrent Transformers
DeLesley S. Hutchins
Imanol Schlag
Yuhuai Wu
Ethan Dyer
Behnam Neyshabur
87
100
0
11 Mar 2022
BERT Busters: Outlier Dimensions that Disrupt Transformers
Olga Kovaleva
Saurabh Kulshreshtha
Anna Rogers
Anna Rumshisky
93
92
0
14 May 2021
DeepReDuce: ReLU Reduction for Fast Private Inference
N. Jha
Zahra Ghodsi
S. Garg
Brandon Reagen
79
91
0
02 Mar 2021
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
142
996
0
12 Feb 2020
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
432
10,531
0
21 Jul 2016
1