Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.04745
Cited By
On Layer Normalization in the Transformer Architecture
12 February 2020
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Layer Normalization in the Transformer Architecture"
50 / 566 papers shown
Title
Attending to Topological Spaces: The Cellular Transformer
Rubén Ballester
Pablo Hernández-García
Mathilde Papillon
Claudio Battiloro
Nina Miolane
Tolga Birdal
Carles Casacuberta
Sergio Escalera
Mustafa Hajij
43
4
0
23 May 2024
Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com
Sergei Krutikov
Bulat Khaertdinov
Rodion Kiriukhin
Shubham Agrawal
Kees Jan de Vries
LMTD
48
0
0
22 May 2024
A Dual Power Grid Cascading Failure Model for the Vulnerability Analysis
Tianxin Zhou
Xiang Li
Haibing Lu
28
0
0
18 May 2024
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Xueyan Niu
Bo Bai
Lei Deng
Wei Han
44
6
0
14 May 2024
Geometry and Dynamics of LayerNorm
P. Riechers
19
1
0
07 May 2024
Learning Linear Block Error Correction Codes
Yoni Choukroun
Lior Wolf
31
6
0
07 May 2024
Position: Understanding LLMs Requires More Than Statistical Generalization
Patrik Reizinger
Szilvia Ujváry
Anna Mészáros
A. Kerekes
Wieland Brendel
Ferenc Huszár
36
12
0
03 May 2024
Nyonic Technical Report
Junfeng Tian
Rui-cang Wang
Cong Li
Yudong Zhou
Jun Liu
Jun Wang
41
0
0
24 Apr 2024
TransformerFAM: Feedback attention is working memory
Dongseong Hwang
Weiran Wang
Zhuoyuan Huo
K. Sim
P. M. Mengibar
40
12
0
14 Apr 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Xuezhe Ma
Xiaomeng Yang
Wenhan Xiong
Beidi Chen
Lili Yu
Hao Zhang
Jonathan May
Luke Zettlemoyer
Omer Levy
Chunting Zhou
53
27
0
12 Apr 2024
Generating Synthetic Time Series Data for Cyber-Physical Systems
Alexander Sommers
Somayeh Bakhtiari Ramezani
Logan Cummins
Sudip Mittal
Shahram Rahimi
Maria Seale
Joseph Jaboure
AI4TS
48
0
0
12 Apr 2024
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai
Juyong Jiang
Le Qin
Junwei Cui
Sunghun Kim
Jiayi Huang
62
7
0
07 Apr 2024
Exploring the Efficacy of Group-Normalization in Deep Learning Models for Alzheimer's Disease Classification
Gousia Habib
Ishfaq Ahmed Malik
Jameel Ahmad
Imtiaz Ahmed
Shaima Qureshi
36
0
0
01 Apr 2024
LayerNorm: A key component in parameter-efficient fine-tuning
Taha ValizadehAslani
Hualou Liang
51
1
0
29 Mar 2024
Word Order's Impacts: Insights from Reordering and Generation Analysis
Qinghua Zhao
Jiaang Li
Lei Li
Zenghui Zhou
Junfeng Liu
38
0
0
18 Mar 2024
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Adam Ibrahim
Benjamin Thérien
Kshitij Gupta
Mats L. Richter
Quentin Anthony
Timothée Lesort
Eugene Belilovsky
Irina Rish
KELM
CLL
44
54
0
13 Mar 2024
Structural Positional Encoding for knowledge integration in transformer-based medical process monitoring
Christopher Irwin
Marco Dossena
G. Leonardi
Stefania Montani
MedIm
38
0
0
13 Mar 2024
A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions
Quoc-Vinh Lai-Dang
ViT
36
2
0
12 Mar 2024
Tractable Joint Prediction and Planning over Discrete Behavior Modes for Urban Driving
Adam R. Villaflor
Brian Yang
Huangyuan Su
Katerina Fragkiadaki
John M. Dolan
Jeff Schneider
59
0
0
12 Mar 2024
Transformer for Times Series: an Application to the S&P500
Pierre Brugiere
G. Turinici
AI4TS
AIFin
18
4
0
04 Mar 2024
ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning
Kuan-Hsun Ho
J. Hung
Berlin Chen
42
0
0
04 Mar 2024
Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models
Amal Rannen-Triki
J. Bornschein
Razvan Pascanu
Marcus Hutter
Andras Gyorgy
Alexandre Galashov
Yee Whye Teh
Michalis K. Titsias
KELM
28
1
0
03 Mar 2024
EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data
Shengjie Wang
Shaohuai Liu
Weirui Ye
Jiacheng You
Yang Gao
OffRL
29
13
0
01 Mar 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
61
117
0
29 Feb 2024
RiNALMo: General-Purpose RNA Language Models Can Generalize Well on Structure Prediction Tasks
Rafael Josip Penić
Tin Vlasic
Roland G. Huber
Yue Wan
M. Šikić
AI4CE
24
27
0
29 Feb 2024
Towards Optimal Learning of Language Models
Yuxian Gu
Li Dong
Y. Hao
Qingxiu Dong
Minlie Huang
Furu Wei
39
7
0
27 Feb 2024
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
Jiaqi Zhai
Lucy Liao
Xing Liu
Yueming Wang
Rui Li
...
Zhaojie Gong
Fangda Gu
Michael He
Yin-Hua Lu
Yu Shi
OffRL
32
48
0
27 Feb 2024
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
Congliang Chen
Tian Ding
Ziniu Li
Ruoyu Sun
Zhimin Luo
40
43
0
26 Feb 2024
Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy
Shuhai Zhang
Yiliao Song
Jiahao Yang
Yuanqing Li
Bo Han
Mingkui Tan
DeLMO
39
5
0
25 Feb 2024
Transformers are Expressive, But Are They Expressive Enough for Regression?
Swaroop Nath
H. Khadilkar
Pushpak Bhattacharyya
34
3
0
23 Feb 2024
Transformer tricks: Precomputing the first layer
Nils Graef
MoE
32
4
0
20 Feb 2024
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Zhiyuan Li
Hong Liu
Denny Zhou
Tengyu Ma
LRM
AI4CE
30
101
0
20 Feb 2024
Any2Graph: Deep End-To-End Supervised Graph Prediction With An Optimal Transport Loss
Paul Krzakala
Junjie Yang
Rémi Flamary
Florence dÁlché-Buc
Charlotte Laclau
Matthieu Labeau
OT
34
1
0
19 Feb 2024
Synthetic location trajectory generation using categorical diffusion models
Simon Dirmeier
Ye Hong
Fernando Pérez-Cruz
29
0
0
19 Feb 2024
A novel molecule generative model of VAE combined with Transformer for unseen structure generation
Yasuhiro Yoshikai
T. Mizuno
Shumpei Nemoto
Hiroyuki Kusuhara
33
3
0
19 Feb 2024
Pushing the Limits of Zero-shot End-to-End Speech Translation
Ioannis Tsiamas
Gerard I. Gállego
José A. R. Fonollosa
Marta R. Costa-jussá
43
7
0
16 Feb 2024
Bridging Associative Memory and Probabilistic Modeling
Rylan Schaeffer
Nika Zahedi
Mikail Khona
Dhruv Pai
Sang T. Truong
...
Sarthak Chandra
Andres Carranza
Ila Rani Fiete
Andrey Gromov
Oluwasanmi Koyejo
DiffM
48
4
0
15 Feb 2024
Graph Structure Inference with BAM: Introducing the Bilinear Attention Mechanism
Philipp Froehlich
Heinz Koeppl
GNN
29
1
0
12 Feb 2024
Unified Training of Universal Time Series Forecasting Transformers
Gerald Woo
Chenghao Liu
Akshat Kumar
Caiming Xiong
Silvio Savarese
Doyen Sahoo
AI4TS
120
170
0
04 Feb 2024
DeepLag: Discovering Deep Lagrangian Dynamics for Intuitive Fluid Prediction
Qilong Ma
Haixu Wu
Lanxiang Xing
Jianmin Wang
Mingsheng Long
AI4CE
34
0
0
04 Feb 2024
Self-attention Networks Localize When QK-eigenspectrum Concentrates
Han Bao
Ryuichiro Hataya
Ryo Karakida
18
5
0
03 Feb 2024
BPDec: Unveiling the Potential of Masked Language Modeling Decoder in BERT pretraining
Wen-Chieh Liang
Youzhi Liang
OffRL
30
2
0
29 Jan 2024
FedGT: Federated Node Classification with Scalable Graph Transformer
Zaixin Zhang
Qingyong Hu
Yang Yu
Weibo Gao
Qi Liu
FedML
46
2
0
26 Jan 2024
Accelerating Material Property Prediction using Generically Complete Isometry Invariants
Jonathan Balasingham
Viktor Zamaraev
V. Kurlin
16
5
0
22 Jan 2024
FourCastNeXt: Optimizing FourCastNet Training for Limited Compute
Edison Guo
Maruf Ahmed
Yue Sun
Rui Yang
Harrison Cook
Tennessee Leeuwenburg
Ben Evans
26
1
0
10 Jan 2024
Unsupervised Salient Patch Selection for Data-Efficient Reinforcement Learning
Zhaohui Jiang
Paul Weng
OffRL
27
0
0
10 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
35
5
0
09 Jan 2024
Spike No More: Stabilizing the Pre-training of Large Language Models
Sho Takase
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
20
14
0
28 Dec 2023
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
Hongzheng Chen
Jiahao Zhang
Yixiao Du
Shaojie Xiang
Zichao Yue
Niansong Zhang
Yaohui Cai
Zhiru Zhang
65
35
0
23 Dec 2023
Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers
James Gunn
Zygmunt Lenyk
Anuj Sharma
Andrea Donati
Alexandru Buburuzan
John Redford
Romain Mueller
MDE
38
8
0
22 Dec 2023
Previous
1
2
3
4
5
...
10
11
12
Next