ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.04745
  4. Cited By
On Layer Normalization in the Transformer Architecture

On Layer Normalization in the Transformer Architecture

12 February 2020
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
    AI4CE
ArXivPDFHTML

Papers citing "On Layer Normalization in the Transformer Architecture"

50 / 566 papers shown
Title
Do Language Models Use Their Depth Efficiently?
Do Language Models Use Their Depth Efficiently?
Róbert Csordás
Christopher D. Manning
Christopher Potts
14
0
0
20 May 2025
This Time is Different: An Observability Perspective on Time Series Foundation Models
This Time is Different: An Observability Perspective on Time Series Foundation Models
Ben Cohen
E. Khwaja
Youssef Doubli
Salahidine Lemaachi
Chris Lettieri
...
Kan Wang
Stephan Xie
David Asker
Ameet Talwalkar
Othmane Abou-Amal
AI4TS
AI4CE
12
0
0
20 May 2025
Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks
Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks
Giyeong Oh
Woohyun Cho
Siyeol Kim
Suhwan Choi
Younjae Yu
24
0
0
17 May 2025
Large Language Models for Computer-Aided Design: A Survey
Large Language Models for Computer-Aided Design: A Survey
Licheng Zhang
Bach Le
Naveed Akhtar
Siew-Kei Lam
Tuan Ngo
3DV
AI4CE
42
0
0
13 May 2025
Efficient Multivariate Time Series Forecasting via Calibrated Language Models with Privileged Knowledge Distillation
Efficient Multivariate Time Series Forecasting via Calibrated Language Models with Privileged Knowledge Distillation
Chenxi Liu
Hao Miao
Qianxiong Xu
Shaowen Zhou
Cheng Long
Yan Zhao
Ziyue Li
Rui Zhao
AI4TS
40
2
0
04 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
106
1
0
01 May 2025
GarmentDiffusion: 3D Garment Sewing Pattern Generation with Multimodal Diffusion Transformers
GarmentDiffusion: 3D Garment Sewing Pattern Generation with Multimodal Diffusion Transformers
Xinyu Li
Qi Yao
Yanjie Wang
DiffM
48
0
0
30 Apr 2025
Direct Motion Models for Assessing Generated Videos
Direct Motion Models for Assessing Generated Videos
Kelsey R. Allen
Carl Doersch
Guangyao Zhou
Mohammed Suhail
Danny Driess
...
Thomas Kipf
Mehdi S. M. Sajjadi
Kevin P. Murphy
João Carreira
Sjoerd van Steenkiste
EGVM
DiffM
VGen
80
0
0
30 Apr 2025
PyViT-FUSE: A Foundation Model for Multi-Sensor Earth Observation Data
PyViT-FUSE: A Foundation Model for Multi-Sensor Earth Observation Data
Manuel Weber
Carly Beneke
ViT
63
0
0
26 Apr 2025
Transformer-Empowered Actor-Critic Reinforcement Learning for Sequence-Aware Service Function Chain Partitioning
Transformer-Empowered Actor-Critic Reinforcement Learning for Sequence-Aware Service Function Chain Partitioning
Cyril Shih-Huan Hsu
Anestis Dalgkitsis
Chrysa Papagianni
Paola Grosso
24
0
0
26 Apr 2025
Plain Transformers Can be Powerful Graph Learners
Plain Transformers Can be Powerful Graph Learners
Liheng Ma
Soumyasundar Pal
Yingxue Zhang
Philip Torr
Mark J. Coates
30
0
0
17 Apr 2025
NNTile: a machine learning framework capable of training extremely large GPT language models on a single node
NNTile: a machine learning framework capable of training extremely large GPT language models on a single node
A. Mikhalev
Aleksandr Katrutsa
Konstantin Sozykin
Ivan Oseledets
35
0
0
17 Apr 2025
Model Hemorrhage and the Robustness Limits of Large Language Models
Model Hemorrhage and the Robustness Limits of Large Language Models
Ziyang Ma
Zhiyu Li
Lefei Zhang
Gui-Song Xia
Bo Du
Liangpei Zhang
Dacheng Tao
62
0
0
31 Mar 2025
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
Tong Nie
Jian Sun
Wei Ma
72
1
0
27 Mar 2025
Cyborg Data: Merging Human with AI Generated Training Data
Cyborg Data: Merging Human with AI Generated Training Data
Kai North
Christopher Ormerod
37
0
0
26 Mar 2025
IgCraft: A versatile sequence generation framework for antibody discovery and engineering
IgCraft: A versatile sequence generation framework for antibody discovery and engineering
Matthew Greenig
Haowen Zhao
Vladimir Radenkovic
Aubin Ramon
Pietro Sormanni
49
0
0
25 Mar 2025
Stack Transformer Based Spatial-Temporal Attention Model for Dynamic Multi-Culture Sign Language Recognition
Stack Transformer Based Spatial-Temporal Attention Model for Dynamic Multi-Culture Sign Language Recognition
Koki Hirooka
Abu Saleh Musa Miah
Tatsuya Murakami
Yuto Akiba
Yong Seok Hwang
Jungpil Shin
SLR
54
0
0
21 Mar 2025
Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens
Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens
Shuqi Lu
Haowei Lin
Lin Yao
Zhifeng Gao
Xiaohong Ji
Weinan E
Linfeng Zhang
Guolin Ke
53
0
0
20 Mar 2025
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
M. Beck
Korbinian Poppel
Phillip Lippe
Richard Kurle
P. Blies
Günter Klambauer
Sebastian Böck
Sepp Hochreiter
LRM
56
1
0
17 Mar 2025
Siamese Foundation Models for Crystal Structure Prediction
Liming Wu
Wenbing Huang
Rui Jiao
Jianxing Huang
Liwei Liu
...
Hao Sun
Yang Liu
F. Sun
Yuxiang Ren
J. Wen
57
0
0
13 Mar 2025
Transformers without Normalization
Jiachen Zhu
Xinlei Chen
Kaiming He
Yann LeCun
Zhuang Liu
ViT
OffRL
65
8
0
13 Mar 2025
Numerical Error Analysis of Large Language Models
Stanislav Budzinskiy
Wenyi Fang
Longbin Zeng
Philipp Petersen
53
1
0
13 Mar 2025
Object-Centric World Model for Language-Guided Manipulation
Youngjoon Jeong
Junha Chun
S. Cha
Taesup Kim
OCL
VGen
214
1
0
08 Mar 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
51
0
0
06 Mar 2025
Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions
E. Liu
Amanda Bertsch
Lintang Sutawika
Lindia Tjuatja
Patrick Fernandes
...
Shri Kiran Srinivasan
Carolin (Haas) Lawrence
Aditi Raghunathan
Kiril Gashteovski
Graham Neubig
90
0
0
05 Mar 2025
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
William Merrill
Ashish Sabharwal
57
5
0
05 Mar 2025
BodyGen: Advancing Towards Efficient Embodiment Co-Design
Haofei Lu
Zhe Wu
Junliang Xing
Jianshu Li
Ruoyu Li
Zhe Li
Yuanchun Shi
44
0
0
01 Mar 2025
XIRVIO: Critic-guided Iterative Refinement for Visual-Inertial Odometry with Explainable Adaptive Weighting
Chit Yuen Lam
Ronald Clark
Basaran Bahadir Kocer
VGen
84
0
0
01 Mar 2025
Efficient Transformer-based Decoder for Varshamov-Tenengolts Codes
Efficient Transformer-based Decoder for Varshamov-Tenengolts Codes
Yali Wei
Alan J.X. Guo
Zihui Yan
Yufan Dai
39
0
0
28 Feb 2025
Amortized Conditional Independence Testing
Amortized Conditional Independence Testing
Bao Duong
Nu Hoang
T. Nguyen
CML
50
0
0
28 Feb 2025
NeoBERT: A Next-Generation BERT
NeoBERT: A Next-Generation BERT
Lola Le Breton
Quentin Fournier
Mariam El Mezouar
Sarath Chandar
AI4TS
77
1
0
26 Feb 2025
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
Jinbo Wang
Mingze Wang
Zhanpeng Zhou
Junchi Yan
Weinan E
Lei Wu
92
1
0
26 Feb 2025
Introduction to Sequence Modeling with Transformers
Introduction to Sequence Modeling with Transformers
Joni-Kristian Kämäräinen
70
1
0
26 Feb 2025
LLM Inference Acceleration via Efficient Operation Fusion
LLM Inference Acceleration via Efficient Operation Fusion
Mahsa Salmani
I. Soloveychik
69
0
0
24 Feb 2025
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang
Haotian Hu
Zhenyu Zhang
Gaojie Jin
Xianrui Li
...
Tianlong Chen
Lu Liu
Qingsong Wen
Zhangyang Wang
Shiwei Liu
MQ
41
0
0
24 Feb 2025
Data Analysis Prediction over Multiple Unseen Datasets: A Vector Embedding Approach
Data Analysis Prediction over Multiple Unseen Datasets: A Vector Embedding Approach
Andreas Loizou
Dimitrios Tsoumakos
45
0
0
24 Feb 2025
Hyperspherical Normalization for Scalable Deep Reinforcement Learning
Hyperspherical Normalization for Scalable Deep Reinforcement Learning
Hojoon Lee
Youngdo Lee
Takuma Seno
Donghu Kim
Peter Stone
Jaegul Choo
68
1
0
24 Feb 2025
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
Wenwen Yu
Zhibo Yang
Jianqiang Wan
Sibo Song
J. Tang
Wenqing Cheng
Yunxing Liu
Xiang Bai
58
3
0
22 Feb 2025
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
58
6
0
21 Feb 2025
A distributional simplicity bias in the learning dynamics of transformers
A distributional simplicity bias in the learning dynamics of transformers
Riccardo Rende
Federica Gerace
Alessandro Laio
Sebastian Goldt
79
8
0
17 Feb 2025
Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model
Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model
Jiarui Jin
Haoyu Wang
Hongyan Li
Jun Yu Li
Jiahui Pan
Shenda Hong
41
5
0
15 Feb 2025
The Curse of Depth in Large Language Models
The Curse of Depth in Large Language Models
Wenfang Sun
Xinyuan Song
Pengxiang Li
Lu Yin
Yefeng Zheng
Shiwei Liu
75
5
0
09 Feb 2025
Transformers trained on proteins can learn to attend to Euclidean distance
Transformers trained on proteins can learn to attend to Euclidean distance
Isaac Ellmen
Constantin Schneider
Matthew I.J. Raybould
Charlotte M. Deane
79
0
0
03 Feb 2025
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Akiyoshi Tomihari
Issei Sato
ODL
61
1
0
31 Jan 2025
OT-Transformer: A Continuous-time Transformer Architecture with Optimal Transport Regularization
OT-Transformer: A Continuous-time Transformer Architecture with Optimal Transport Regularization
Kelvin Kan
Xingjian Li
Stanley Osher
101
2
0
30 Jan 2025
Improving Out-of-Distribution Generalization of Trajectory Prediction for Autonomous Driving via Polynomial Representations
Improving Out-of-Distribution Generalization of Trajectory Prediction for Autonomous Driving via Polynomial Representations
Yue Yao
Shengchao Yan
Daniel Goehring
Wolfram Burgard
Joerg Reichardt
OODD
59
2
0
28 Jan 2025
Automatic selection of the best neural architecture for time series forecasting via multi-objective optimization and Pareto optimality conditions
Automatic selection of the best neural architecture for time series forecasting via multi-objective optimization and Pareto optimality conditions
Qianying Cao
Shanqing Liu
Alan John Varghese
Jérome Darbon
M. Triantafyllou
George Karniadakis
AI4TS
232
0
0
21 Jan 2025
MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition
MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition
Yanjie Cui
Xiaohong Liu
Jing Liang
Yamin Fu
67
1
0
17 Jan 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zhangyang Wang
Shiwei Liu
47
1
0
12 Jan 2025
AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery
AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery
Johann Wenckstern
Eeshaan Jain
Kiril Vasilev
Matteo Pariset
Andreas Wicki
Gabriele Gut
Charlotte Bunne
38
2
0
10 Jan 2025
1234...101112
Next