ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.04745
  4. Cited By
On Layer Normalization in the Transformer Architecture

On Layer Normalization in the Transformer Architecture

12 February 2020
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
    AI4CE
ArXivPDFHTML

Papers citing "On Layer Normalization in the Transformer Architecture"

50 / 566 papers shown
Title
How Smooth Is Attention?
How Smooth Is Attention?
Valérie Castin
Pierre Ablin
Gabriel Peyré
AAML
40
9
0
22 Dec 2023
BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer
  Learning using Wav2Vec 2.0
BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0
Miseul Kim
Zhenyu Piao
Jihyun Lee
Hong-Goo Kang
71
3
0
21 Dec 2023
Learning Flexible Body Collision Dynamics with Hierarchical Contact Mesh
  Transformer
Learning Flexible Body Collision Dynamics with Hierarchical Contact Mesh Transformer
Youn-Yeol Yu
Jeongwhan Choi
Woojin Cho
Kookjin Lee
Nayong Kim
...
Ilho Kim
Seok-Woo Lee
Joon Young Yang
S. Yoon
Noseong Park
AI4CE
23
7
0
19 Dec 2023
One-Step Diffusion Distillation via Deep Equilibrium Models
One-Step Diffusion Distillation via Deep Equilibrium Models
Zhengyang Geng
Ashwini Pokle
Trevor Killeen
34
30
0
12 Dec 2023
Why "classic" Transformers are shallow and how to make them go deep
Why "classic" Transformers are shallow and how to make them go deep
Yueyao Yu
Yin Zhang
ViT
16
0
0
11 Dec 2023
Large-scale Training of Foundation Models for Wearable Biosignals
Large-scale Training of Foundation Models for Wearable Biosignals
Salar Abbaspourazad
Oussama Elachqar
Andrew C. Miller
S. Emrani
Udhyakumar Nallasamy
Ian Shapiro
38
32
0
08 Dec 2023
Transformers are uninterpretable with myopic methods: a case study with
  bounded Dyck grammars
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars
Kaiyue Wen
Yuchen Li
Bing Liu
Andrej Risteski
34
22
0
03 Dec 2023
MABViT -- Modified Attention Block Enhances Vision Transformers
MABViT -- Modified Attention Block Enhances Vision Transformers
Mahesh Ramesh
Aswinkumar Ramkumar
19
3
0
03 Dec 2023
Probabilistic Transformer: A Probabilistic Dependency Model for
  Contextual Word Representation
Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation
Haoyi Wu
Kewei Tu
206
3
0
26 Nov 2023
Who is leading in AI? An analysis of industry AI research
Who is leading in AI? An analysis of industry AI research
Ben Cottier
T. Besiroglu
David Owen
36
7
0
24 Nov 2023
DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert
  Pretraining
DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert Pretraining
Martin Kuo
Jianyi Zhang
Yiran Chen
27
2
0
08 Nov 2023
Euclidean, Projective, Conformal: Choosing a Geometric Algebra for
  Equivariant Transformers
Euclidean, Projective, Conformal: Choosing a Geometric Algebra for Equivariant Transformers
P. D. Haan
Taco S. Cohen
Johann Brehmer
38
9
0
08 Nov 2023
Signal Processing Meets SGD: From Momentum to Filter
Signal Processing Meets SGD: From Momentum to Filter
Zhipeng Yao
Guisong Chang
Jiaqi Zhang
Qi Zhang
Dazhou Li
Yu Zhang
ODL
39
0
0
06 Nov 2023
Yet Another Generative Model For Room Impulse Response Estimation
Yet Another Generative Model For Room Impulse Response Estimation
Sungho Lee
Hyeong-Seok Choi
Kyogu Lee
34
10
0
05 Nov 2023
Simplifying Transformer Blocks
Simplifying Transformer Blocks
Bobby He
Thomas Hofmann
27
31
0
03 Nov 2023
ATHENA: Mathematical Reasoning with Thought Expansion
ATHENA: Mathematical Reasoning with Thought Expansion
JB. Kim
Hazel Kim
Joonghyuk Hahn
Yo-Sub Han
ReLM
LRM
AIMat
50
7
0
02 Nov 2023
Global Transformer Architecture for Indoor Room Temperature Forecasting
Global Transformer Architecture for Indoor Room Temperature Forecasting
Alfredo V. Clemente
A. Nocente
Massimiliano Ruocco
AI4CE
18
1
0
31 Oct 2023
TorchDEQ: A Library for Deep Equilibrium Models
TorchDEQ: A Library for Deep Equilibrium Models
Zhengyang Geng
J. Zico Kolter
VLM
62
12
0
28 Oct 2023
ScaleLong: Towards More Stable Training of Diffusion Model via Scaling
  Network Long Skip Connection
ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection
Zhongzhan Huang
Pan Zhou
Shuicheng Yan
Liang Lin
24
26
0
20 Oct 2023
Real-Time Motion Prediction via Heterogeneous Polyline Transformer with
  Relative Pose Encoding
Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding
Zhejun Zhang
Alexander Liniger
Daniel Gehrig
Fisher Yu
Luc Van Gool
66
31
0
19 Oct 2023
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced
  Optimization Problems
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
David T. Hoffmann
Simon Schrodi
Jelena Bratulić
Nadine Behrmann
Volker Fischer
Thomas Brox
38
5
0
19 Oct 2023
Cross-attention Spatio-temporal Context Transformer for Semantic
  Segmentation of Historical Maps
Cross-attention Spatio-temporal Context Transformer for Semantic Segmentation of Historical Maps
Sidi Wu
Yizi Chen
Konrad Schindler
L. Hurni
31
2
0
19 Oct 2023
Enhanced Transformer Architecture for Natural Language Processing
Enhanced Transformer Architecture for Natural Language Processing
Woohyeon Moon
Taeyoung Kim
Bumgeun Park
Dongsoo Har
30
0
0
17 Oct 2023
AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
Jake Grigsby
Linxi Fan
Yuke Zhu
OffRL
LM&Ro
38
10
0
15 Oct 2023
LEMON: Lossless model expansion
LEMON: Lossless model expansion
Yite Wang
Jiahao Su
Hanlin Lu
Cong Xie
Tianyi Liu
Jianbo Yuan
Yanghua Peng
Ruoyu Sun
Hongxia Yang
17
12
0
12 Oct 2023
The Expressive Power of Transformers with Chain of Thought
The Expressive Power of Transformers with Chain of Thought
William Merrill
Ashish Sabharwal
LRM
AI4CE
ReLM
27
0
0
11 Oct 2023
PHYDI: Initializing Parameterized Hypercomplex Neural Networks as
  Identity Functions
PHYDI: Initializing Parameterized Hypercomplex Neural Networks as Identity Functions
Matteo Mancanelli
Eleonora Grassucci
A. Uncini
Danilo Comminiello
AI4CE
51
2
0
11 Oct 2023
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model
Peng Di
Jianguo Li
Hang Yu
Wei Jiang
Wenting Cai
...
Zelin Zhao
Xunjin Zheng
Hailian Zhou
Lifu Zhu
Xianying Zhu
ELM
ALM
AI4CE
35
12
0
10 Oct 2023
Pushing the Limits of Pre-training for Time Series Forecasting in the
  CloudOps Domain
Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain
Gerald Woo
Chenghao Liu
Akshat Kumar
Doyen Sahoo
AI4TS
AI4CE
33
13
0
08 Oct 2023
Multiple Physics Pretraining for Physical Surrogate Models
Multiple Physics Pretraining for Physical Surrogate Models
Michael McCabe
Bruno Régaldo-Saint Blancard
Liam Parker
Ruben Ohana
M. Cranmer
...
Francois Lanusse
Mariel Pettee
Tiberiu Teşileanu
Kyunghyun Cho
Shirley Ho
PINN
AI4CE
40
53
0
04 Oct 2023
Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit
  Quantization and Robustness
Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness
Young Jin Kim
Raffy Fahim
Hany Awadalla
MQ
MoE
66
19
0
03 Oct 2023
BTR: Binary Token Representations for Efficient Retrieval Augmented
  Language Models
BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models
Qingqing Cao
Sewon Min
Yizhong Wang
Hannaneh Hajishirzi
MQ
RALM
40
4
0
02 Oct 2023
Evolutionary Neural Architecture Search for Transformer in Knowledge
  Tracing
Evolutionary Neural Architecture Search for Transformer in Knowledge Tracing
Shangshang Yang
Xiaoshan Yu
Ye Tian
Xueming Yan
Haiping Ma
Xingyi Zhang
ViT
KELM
AI4Ed
24
2
0
02 Oct 2023
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and
  Scaling Limit
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit
Blake Bordelon
Lorenzo Noci
Mufan Li
Boris Hanin
Cengiz Pehlevan
35
22
0
28 Sep 2023
Graph-level Representation Learning with Joint-Embedding Predictive Architectures
Graph-level Representation Learning with Joint-Embedding Predictive Architectures
Geri Skenderi
Hang Li
Jiliang Tang
Marco Cristani
AI4TS
GNN
54
3
0
27 Sep 2023
On Separate Normalization in Self-supervised Transformers
On Separate Normalization in Self-supervised Transformers
Xiaohui Chen
Yinkai Wang
Yuanqi Du
S. Hassoun
Liping Liu
ViT
27
1
0
22 Sep 2023
A Diffusion-Model of Joint Interactive Navigation
A Diffusion-Model of Joint Interactive Navigation
Matthew Niedoba
J. Lavington
Yunpeng Liu
Vasileios Lioutas
Justice Sefas
...
Dylan Green
Setareh Dabiri
Berend Zwartsenberg
Adam Scibior
Frank Wood
DiffM
24
14
0
21 Sep 2023
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
Haodong Duan
Mingze Xu
Bing Shuai
Davide Modolo
Zhuowen Tu
Joseph Tighe
Alessandro Bergamo
ViT
35
1
0
20 Sep 2023
Baichuan 2: Open Large-scale Language Models
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Zenan Zhou
Zhiying Wu
ELM
LRM
77
710
0
19 Sep 2023
Traveling Words: A Geometric Interpretation of Transformers
Traveling Words: A Geometric Interpretation of Transformers
Raul Molina
27
4
0
13 Sep 2023
Revisiting Energy Based Models as Policies: Ranking Noise Contrastive
  Estimation and Interpolating Energy Models
Revisiting Energy Based Models as Policies: Ranking Noise Contrastive Estimation and Interpolating Energy Models
Sumeet Singh
Stephen Tu
Vikas Sindhwani
DiffM
20
8
0
11 Sep 2023
Enhance Multi-domain Sentiment Analysis of Review Texts through
  Prompting Strategies
Enhance Multi-domain Sentiment Analysis of Review Texts through Prompting Strategies
Yajing Wang
Zongwei Luo
LRM
19
5
0
05 Sep 2023
Learning multi-modal generative models with permutation-invariant
  encoders and tighter variational bounds
Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds
Marcel Hirt
Domenico Campolo
Victoria Leong
Juan-Pablo Ortega
DRL
15
0
0
01 Sep 2023
Internal Cross-layer Gradients for Extending Homogeneity to
  Heterogeneity in Federated Learning
Internal Cross-layer Gradients for Extending Homogeneity to Heterogeneity in Federated Learning
Yun-Hin Chan
Rui Zhou
Running Zhao
Zhihan Jiang
Edith C.H. Ngai
FedML
38
8
0
22 Aug 2023
Video OWL-ViT: Temporally-consistent open-world localization in video
Video OWL-ViT: Temporally-consistent open-world localization in video
G. Heigold
Matthias Minderer
A. Gritsenko
Alex Bewley
Daniel Keysers
Mario Luvcić
Feng Yu
Thomas Kipf
VLM
24
14
0
22 Aug 2023
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only
  Quantization for LLMs
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs
Young Jin Kim
Rawn Henry
Raffy Fahim
Hany Awadalla
MQ
42
19
0
16 Aug 2023
Attention Is Not All You Need Anymore
Attention Is Not All You Need Anymore
Zhe Chen
32
3
0
15 Aug 2023
3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking
3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking
Shuxiao Ding
Eike Rehder
Lukas Schneider
Marius Cordts
Juergen Gall
3DPC
33
17
0
12 Aug 2023
MAP: A Model-agnostic Pretraining Framework for Click-through Rate
  Prediction
MAP: A Model-agnostic Pretraining Framework for Click-through Rate Prediction
Jianghao Lin
Yanru Qu
Wei Guo
Xinyi Dai
Ruiming Tang
Yong Yu
Weinan Zhang
30
21
0
03 Aug 2023
From Sparse to Soft Mixtures of Experts
From Sparse to Soft Mixtures of Experts
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
121
114
0
02 Aug 2023
Previous
123456...101112
Next