ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.09637
  4. Cited By
ReLU's Revival: On the Entropic Overload in Normalization-Free Large
  Language Models
v1v2v3 (latest)

ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models

12 October 2024
N. Jha
Brandon Reagen
    OffRLAI4CE
ArXiv (abs)PDFHTMLGithub

Papers citing "ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models"

14 / 14 papers shown
Title
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
79
0
0
06 Mar 2025
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax
  Mimicry
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Michael Zhang
Kush S. Bhatia
Hermann Kumbong
Christopher Ré
66
54
0
06 Feb 2024
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM
  Finetuning
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning
Bingchen Zhao
Haoqin Tu
Chen Wei
Jieru Mei
Cihang Xie
82
36
0
18 Dec 2023
Linear Log-Normal Attention with Unbiased Concentration
Linear Log-Normal Attention with Unbiased Concentration
Yury Nahshan
Dor-Joseph Kampeas
E. Haleva
59
8
0
22 Nov 2023
Converting Transformers to Polynomial Form for Secure Inference Over
  Homomorphic Encryption
Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption
Itamar Zimerman
Moran Baruch
Nir Drucker
Gilad Ezov
Omri Soceanu
Lior Wolf
80
17
0
15 Nov 2023
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width
  Limit
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit
Lorenzo Noci
Chuning Li
Mufan Li
Bobby He
Thomas Hofmann
Chris J. Maddison
Daniel M. Roy
90
36
0
30 Jun 2023
DeepReShape: Redesigning Neural Networks for Efficient Private Inference
DeepReShape: Redesigning Neural Networks for Efficient Private Inference
N. Jha
Brandon Reagen
67
10
0
20 Apr 2023
Deep Transformers without Shortcuts: Modifying Self-attention for
  Faithful Signal Propagation
Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation
Bobby He
James Martens
Guodong Zhang
Aleksandar Botev
Andy Brock
Samuel L. Smith
Yee Whye Teh
79
30
0
20 Feb 2023
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language
  Models
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
Xiuying Wei
Yunchen Zhang
Xiangguo Zhang
Ruihao Gong
Shanghang Zhang
Qi Zhang
F. Yu
Xianglong Liu
MQ
100
152
0
27 Sep 2022
Block-Recurrent Transformers
Block-Recurrent Transformers
DeLesley S. Hutchins
Imanol Schlag
Yuhuai Wu
Ethan Dyer
Behnam Neyshabur
87
100
0
11 Mar 2022
BERT Busters: Outlier Dimensions that Disrupt Transformers
BERT Busters: Outlier Dimensions that Disrupt Transformers
Olga Kovaleva
Saurabh Kulshreshtha
Anna Rogers
Anna Rumshisky
93
92
0
14 May 2021
DeepReDuce: ReLU Reduction for Fast Private Inference
DeepReDuce: ReLU Reduction for Fast Private Inference
N. Jha
Zahra Ghodsi
S. Garg
Brandon Reagen
79
91
0
02 Mar 2021
On Layer Normalization in the Transformer Architecture
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
142
996
0
12 Feb 2020
Layer Normalization
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
432
10,531
0
21 Jul 2016
1