ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.04745
  4. Cited By
On Layer Normalization in the Transformer Architecture

On Layer Normalization in the Transformer Architecture

12 February 2020
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
    AI4CE
ArXivPDFHTML

Papers citing "On Layer Normalization in the Transformer Architecture"

50 / 566 papers shown
Title
Attending to Topological Spaces: The Cellular Transformer
Attending to Topological Spaces: The Cellular Transformer
Rubén Ballester
Pablo Hernández-García
Mathilde Papillon
Claudio Battiloro
Nina Miolane
Tolga Birdal
Carles Casacuberta
Sergio Escalera
Mustafa Hajij
43
4
0
23 May 2024
Challenging Gradient Boosted Decision Trees with Tabular Transformers
  for Fraud Detection at Booking.com
Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com
Sergei Krutikov
Bulat Khaertdinov
Rodion Kiriukhin
Shubham Agrawal
Kees Jan de Vries
LMTD
48
0
0
22 May 2024
A Dual Power Grid Cascading Failure Model for the Vulnerability Analysis
A Dual Power Grid Cascading Failure Model for the Vulnerability Analysis
Tianxin Zhou
Xiang Li
Haibing Lu
28
0
0
18 May 2024
Beyond Scaling Laws: Understanding Transformer Performance with
  Associative Memory
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Xueyan Niu
Bo Bai
Lei Deng
Wei Han
44
6
0
14 May 2024
Geometry and Dynamics of LayerNorm
Geometry and Dynamics of LayerNorm
P. Riechers
19
1
0
07 May 2024
Learning Linear Block Error Correction Codes
Learning Linear Block Error Correction Codes
Yoni Choukroun
Lior Wolf
31
6
0
07 May 2024
Position: Understanding LLMs Requires More Than Statistical
  Generalization
Position: Understanding LLMs Requires More Than Statistical Generalization
Patrik Reizinger
Szilvia Ujváry
Anna Mészáros
A. Kerekes
Wieland Brendel
Ferenc Huszár
36
12
0
03 May 2024
Nyonic Technical Report
Nyonic Technical Report
Junfeng Tian
Rui-cang Wang
Cong Li
Yudong Zhou
Jun Liu
Jun Wang
41
0
0
24 Apr 2024
TransformerFAM: Feedback attention is working memory
TransformerFAM: Feedback attention is working memory
Dongseong Hwang
Weiran Wang
Zhuoyuan Huo
K. Sim
P. M. Mengibar
40
12
0
14 Apr 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
  Context Length
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Xuezhe Ma
Xiaomeng Yang
Wenhan Xiong
Beidi Chen
Lili Yu
Hao Zhang
Jonathan May
Luke Zettlemoyer
Omer Levy
Chunting Zhou
53
27
0
12 Apr 2024
Generating Synthetic Time Series Data for Cyber-Physical Systems
Generating Synthetic Time Series Data for Cyber-Physical Systems
Alexander Sommers
Somayeh Bakhtiari Ramezani
Logan Cummins
Sudip Mittal
Shahram Rahimi
Maria Seale
Joseph Jaboure
AI4TS
48
0
0
12 Apr 2024
Shortcut-connected Expert Parallelism for Accelerating
  Mixture-of-Experts
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai
Juyong Jiang
Le Qin
Junwei Cui
Sunghun Kim
Jiayi Huang
62
7
0
07 Apr 2024
Exploring the Efficacy of Group-Normalization in Deep Learning Models
  for Alzheimer's Disease Classification
Exploring the Efficacy of Group-Normalization in Deep Learning Models for Alzheimer's Disease Classification
Gousia Habib
Ishfaq Ahmed Malik
Jameel Ahmad
Imtiaz Ahmed
Shaima Qureshi
36
0
0
01 Apr 2024
LayerNorm: A key component in parameter-efficient fine-tuning
LayerNorm: A key component in parameter-efficient fine-tuning
Taha ValizadehAslani
Hualou Liang
51
1
0
29 Mar 2024
Word Order's Impacts: Insights from Reordering and Generation Analysis
Word Order's Impacts: Insights from Reordering and Generation Analysis
Qinghua Zhao
Jiaang Li
Lei Li
Zenghui Zhou
Junfeng Liu
38
0
0
18 Mar 2024
Simple and Scalable Strategies to Continually Pre-train Large Language
  Models
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Adam Ibrahim
Benjamin Thérien
Kshitij Gupta
Mats L. Richter
Quentin Anthony
Timothée Lesort
Eugene Belilovsky
Irina Rish
KELM
CLL
44
54
0
13 Mar 2024
Structural Positional Encoding for knowledge integration in
  transformer-based medical process monitoring
Structural Positional Encoding for knowledge integration in transformer-based medical process monitoring
Christopher Irwin
Marco Dossena
G. Leonardi
Stefania Montani
MedIm
38
0
0
13 Mar 2024
A Survey of Vision Transformers in Autonomous Driving: Current Trends
  and Future Directions
A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions
Quoc-Vinh Lai-Dang
ViT
36
2
0
12 Mar 2024
Tractable Joint Prediction and Planning over Discrete Behavior Modes for
  Urban Driving
Tractable Joint Prediction and Planning over Discrete Behavior Modes for Urban Driving
Adam R. Villaflor
Brian Yang
Huangyuan Su
Katerina Fragkiadaki
John M. Dolan
Jeff Schneider
59
0
0
12 Mar 2024
Transformer for Times Series: an Application to the S&P500
Transformer for Times Series: an Application to the S&P500
Pierre Brugiere
G. Turinici
AI4TS
AIFin
18
4
0
04 Mar 2024
ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by
  Magnitude Conditioning
ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning
Kuan-Hsun Ho
J. Hung
Berlin Chen
42
0
0
04 Mar 2024
Revisiting Dynamic Evaluation: Online Adaptation for Large Language
  Models
Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models
Amal Rannen-Triki
J. Bornschein
Razvan Pascanu
Marcus Hutter
Andras Gyorgy
Alexandre Galashov
Yee Whye Teh
Michalis K. Titsias
KELM
28
1
0
03 Mar 2024
EfficientZero V2: Mastering Discrete and Continuous Control with Limited
  Data
EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data
Shengjie Wang
Shaohuai Liu
Weirui Ye
Jiacheng You
Yang Gao
OffRL
29
13
0
01 Mar 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for
  Efficient Language Models
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
61
117
0
29 Feb 2024
RiNALMo: General-Purpose RNA Language Models Can Generalize Well on
  Structure Prediction Tasks
RiNALMo: General-Purpose RNA Language Models Can Generalize Well on Structure Prediction Tasks
Rafael Josip Penić
Tin Vlasic
Roland G. Huber
Yue Wan
M. Šikić
AI4CE
24
27
0
29 Feb 2024
Towards Optimal Learning of Language Models
Towards Optimal Learning of Language Models
Yuxian Gu
Li Dong
Y. Hao
Qingxiu Dong
Minlie Huang
Furu Wei
39
7
0
27 Feb 2024
Actions Speak Louder than Words: Trillion-Parameter Sequential
  Transducers for Generative Recommendations
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
Jiaqi Zhai
Lucy Liao
Xing Liu
Yueming Wang
Rui Li
...
Zhaojie Gong
Fangda Gu
Michael He
Yin-Hua Lu
Yu Shi
OffRL
32
48
0
27 Feb 2024
Why Transformers Need Adam: A Hessian Perspective
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
Congliang Chen
Tian Ding
Ziniu Li
Ruoyu Sun
Zhimin Luo
40
43
0
26 Feb 2024
Detecting Machine-Generated Texts by Multi-Population Aware Optimization
  for Maximum Mean Discrepancy
Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy
Shuhai Zhang
Yiliao Song
Jiahao Yang
Yuanqing Li
Bo Han
Mingkui Tan
DeLMO
39
5
0
25 Feb 2024
Transformers are Expressive, But Are They Expressive Enough for
  Regression?
Transformers are Expressive, But Are They Expressive Enough for Regression?
Swaroop Nath
H. Khadilkar
Pushpak Bhattacharyya
34
3
0
23 Feb 2024
Transformer tricks: Precomputing the first layer
Transformer tricks: Precomputing the first layer
Nils Graef
MoE
32
4
0
20 Feb 2024
Chain of Thought Empowers Transformers to Solve Inherently Serial
  Problems
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Zhiyuan Li
Hong Liu
Denny Zhou
Tengyu Ma
LRM
AI4CE
30
101
0
20 Feb 2024
Any2Graph: Deep End-To-End Supervised Graph Prediction With An Optimal
  Transport Loss
Any2Graph: Deep End-To-End Supervised Graph Prediction With An Optimal Transport Loss
Paul Krzakala
Junjie Yang
Rémi Flamary
Florence dÁlché-Buc
Charlotte Laclau
Matthieu Labeau
OT
34
1
0
19 Feb 2024
Synthetic location trajectory generation using categorical diffusion
  models
Synthetic location trajectory generation using categorical diffusion models
Simon Dirmeier
Ye Hong
Fernando Pérez-Cruz
29
0
0
19 Feb 2024
A novel molecule generative model of VAE combined with Transformer for
  unseen structure generation
A novel molecule generative model of VAE combined with Transformer for unseen structure generation
Yasuhiro Yoshikai
T. Mizuno
Shumpei Nemoto
Hiroyuki Kusuhara
33
3
0
19 Feb 2024
Pushing the Limits of Zero-shot End-to-End Speech Translation
Pushing the Limits of Zero-shot End-to-End Speech Translation
Ioannis Tsiamas
Gerard I. Gállego
José A. R. Fonollosa
Marta R. Costa-jussá
43
7
0
16 Feb 2024
Bridging Associative Memory and Probabilistic Modeling
Bridging Associative Memory and Probabilistic Modeling
Rylan Schaeffer
Nika Zahedi
Mikail Khona
Dhruv Pai
Sang T. Truong
...
Sarthak Chandra
Andres Carranza
Ila Rani Fiete
Andrey Gromov
Oluwasanmi Koyejo
DiffM
48
4
0
15 Feb 2024
Graph Structure Inference with BAM: Introducing the Bilinear Attention
  Mechanism
Graph Structure Inference with BAM: Introducing the Bilinear Attention Mechanism
Philipp Froehlich
Heinz Koeppl
GNN
29
1
0
12 Feb 2024
Unified Training of Universal Time Series Forecasting Transformers
Unified Training of Universal Time Series Forecasting Transformers
Gerald Woo
Chenghao Liu
Akshat Kumar
Caiming Xiong
Silvio Savarese
Doyen Sahoo
AI4TS
120
170
0
04 Feb 2024
DeepLag: Discovering Deep Lagrangian Dynamics for Intuitive Fluid
  Prediction
DeepLag: Discovering Deep Lagrangian Dynamics for Intuitive Fluid Prediction
Qilong Ma
Haixu Wu
Lanxiang Xing
Jianmin Wang
Mingsheng Long
AI4CE
34
0
0
04 Feb 2024
Self-attention Networks Localize When QK-eigenspectrum Concentrates
Self-attention Networks Localize When QK-eigenspectrum Concentrates
Han Bao
Ryuichiro Hataya
Ryo Karakida
18
5
0
03 Feb 2024
BPDec: Unveiling the Potential of Masked Language Modeling Decoder in
  BERT pretraining
BPDec: Unveiling the Potential of Masked Language Modeling Decoder in BERT pretraining
Wen-Chieh Liang
Youzhi Liang
OffRL
30
2
0
29 Jan 2024
FedGT: Federated Node Classification with Scalable Graph Transformer
FedGT: Federated Node Classification with Scalable Graph Transformer
Zaixin Zhang
Qingyong Hu
Yang Yu
Weibo Gao
Qi Liu
FedML
46
2
0
26 Jan 2024
Accelerating Material Property Prediction using Generically Complete
  Isometry Invariants
Accelerating Material Property Prediction using Generically Complete Isometry Invariants
Jonathan Balasingham
Viktor Zamaraev
V. Kurlin
16
5
0
22 Jan 2024
FourCastNeXt: Optimizing FourCastNet Training for Limited Compute
FourCastNeXt: Optimizing FourCastNet Training for Limited Compute
Edison Guo
Maruf Ahmed
Yue Sun
Rui Yang
Harrison Cook
Tennessee Leeuwenburg
Ben Evans
26
1
0
10 Jan 2024
Unsupervised Salient Patch Selection for Data-Efficient Reinforcement
  Learning
Unsupervised Salient Patch Selection for Data-Efficient Reinforcement Learning
Zhaohui Jiang
Paul Weng
OffRL
27
0
0
10 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
35
5
0
09 Jan 2024
Spike No More: Stabilizing the Pre-training of Large Language Models
Spike No More: Stabilizing the Pre-training of Large Language Models
Sho Takase
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
20
14
0
28 Dec 2023
Understanding the Potential of FPGA-Based Spatial Acceleration for Large
  Language Model Inference
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
Hongzheng Chen
Jiahao Zhang
Yixiao Du
Shaojie Xiang
Zichao Yue
Niansong Zhang
Yaohui Cai
Zhiru Zhang
65
35
0
23 Dec 2023
Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using
  transformers
Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers
James Gunn
Zygmunt Lenyk
Anuj Sharma
Andrea Donati
Alexandru Buburuzan
John Redford
Romain Mueller
MDE
38
8
0
22 Dec 2023
Previous
12345...101112
Next