ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.04745
  4. Cited By
On Layer Normalization in the Transformer Architecture

On Layer Normalization in the Transformer Architecture

12 February 2020
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
    AI4CE
ArXivPDFHTML

Papers citing "On Layer Normalization in the Transformer Architecture"

50 / 566 papers shown
Title
Bidirectional Representations for Low Resource Spoken Language
  Understanding
Bidirectional Representations for Low Resource Spoken Language Understanding
Quentin Meeus
Marie-Francine Moens
Hugo Van hamme
21
2
0
24 Nov 2022
Uncertainty-aware Vision-based Metric Cross-view Geolocalization
Uncertainty-aware Vision-based Metric Cross-view Geolocalization
F. Fervers
Sebastian Bullinger
C. Bodensteiner
Michael Arens
Rainer Stiefelhagen
40
40
0
22 Nov 2022
Impact of visual assistance for automated audio captioning
Impact of visual assistance for automated audio captioning
Wim Boes
Hugo Van hamme
17
1
0
18 Nov 2022
Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text
  Generation via Concentrating Attention
Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating Attention
Wenhao Li
Xiaoyuan Yi
Jinyi Hu
Maosong Sun
Xing Xie
46
0
0
14 Nov 2022
InternImage: Exploring Large-Scale Vision Foundation Models with
  Deformable Convolutions
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
53
661
0
10 Nov 2022
Efficient Speech Translation with Dynamic Latent Perceivers
Efficient Speech Translation with Dynamic Latent Perceivers
Ioannis Tsiamas
Gerard I. Gállego
José A. R. Fonollosa
Marta R. Costa-jussá
30
2
0
28 Oct 2022
Target-Speaker Voice Activity Detection via Sequence-to-Sequence
  Prediction
Target-Speaker Voice Activity Detection via Sequence-to-Sequence Prediction
Ming Cheng
Weiqing Wang
Yucong Zhang
Xiaoyi Qin
Ming Li
VLM
56
33
0
28 Oct 2022
GCT: Gated Contextual Transformer for Sequential Audio Tagging
GCT: Gated Contextual Transformer for Sequential Audio Tagging
Yuanbo Hou
Yun Wang
Wenwu Wang
Dick Botteldooren
33
0
0
22 Oct 2022
S2WAT: Image Style Transfer via Hierarchical Vision Transformer using
  Strips Window Attention
S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention
Chi Zhang
Lu Zhou
Lei Wang
Zaiyan Dai
Jun Yang
ViT
39
24
0
22 Oct 2022
Domain Specific Sub-network for Multi-Domain Neural Machine Translation
Domain Specific Sub-network for Multi-Domain Neural Machine Translation
Amr Hendy
M. Abdelghaffar
Mohamed Afify
Ahmed Tawfik
AI4CE
28
0
0
18 Oct 2022
Decentralized Coverage Path Planning with Reinforcement Learning and
  Dual Guidance
Decentralized Coverage Path Planning with Reinforcement Learning and Dual Guidance
Yongkai Liu
Jiawei Hu
Wei Dong
16
2
0
14 Oct 2022
Interactive Language: Talking to Robots in Real Time
Interactive Language: Talking to Robots in Real Time
Corey Lynch
Ayzaan Wahid
Jonathan Tompson
Tianli Ding
James Betker
Robert Baruch
Travis Armstrong
Peter R. Florence
LM&Ro
38
215
0
12 Oct 2022
Towards Theoretically Inspired Neural Initialization Optimization
Towards Theoretically Inspired Neural Initialization Optimization
Yibo Yang
Hong Wang
Haobo Yuan
Zhouchen Lin
32
9
0
12 Oct 2022
ZITS++: Image Inpainting by Improving the Incremental Transformer on
  Structural Priors
ZITS++: Image Inpainting by Improving the Incremental Transformer on Structural Priors
Chenjie Cao
Qiaole Dong
Yanwei Fu
40
30
0
12 Oct 2022
SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric
  Models
SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models
Ziyi Wu
Nikita Dvornik
Klaus Greff
Thomas Kipf
Animesh Garg
OCL
BDL
67
91
0
12 Oct 2022
A Logic for Expressing Log-Precision Transformers
A Logic for Expressing Log-Precision Transformers
William Merrill
Ashish Sabharwal
ReLM
NAI
LRM
56
48
0
06 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
275
1,077
0
05 Oct 2022
A Comparison of Transformer, Convolutional, and Recurrent Neural
  Networks on Phoneme Recognition
A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition
Kyuhong Shim
Wonyong Sung
27
2
0
01 Oct 2022
Transformer Meets Boundary Value Inverse Problems
Transformer Meets Boundary Value Inverse Problems
Ruchi Guo
Shuhao Cao
Long Chen
MedIm
38
21
0
29 Sep 2022
Bridging the Gap to Real-World Object-Centric Learning
Bridging the Gap to Real-World Object-Centric Learning
Maximilian Seitzer
Max Horn
Andrii Zadaianchuk
Dominik Zietlow
Tianjun Xiao
...
Tong He
Zheng-Wei Zhang
Bernhard Schölkopf
Thomas Brox
Francesco Locatello
OCL
50
140
0
29 Sep 2022
Multi-encoder attention-based architectures for sound recognition with
  partial visual assistance
Multi-encoder attention-based architectures for sound recognition with partial visual assistance
Wim Boes
Hugo Van hamme
16
1
0
26 Sep 2022
Batch Layer Normalization, A new normalization layer for CNNs and RNN
Batch Layer Normalization, A new normalization layer for CNNs and RNN
A. Ziaee
Erion cCano
19
13
0
19 Sep 2022
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for
  End-to-End Speech Recognition
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition
Ye Bai
Jie Li
W. Han
Hao Ni
Kaituo Xu
Zhuo Zhang
Cheng Yi
Xiaorui Wang
MoE
31
1
0
17 Sep 2022
Denoising Diffusion Error Correction Codes
Denoising Diffusion Error Correction Codes
Yoni Choukroun
Lior Wolf
DiffM
47
27
0
16 Sep 2022
CRAFT: Camera-Radar 3D Object Detection with Spatio-Contextual Fusion
  Transformer
CRAFT: Camera-Radar 3D Object Detection with Spatio-Contextual Fusion Transformer
Youngseok Kim
Sanmin Kim
Junwon Choi
Dongsuk Kum
37
75
0
14 Sep 2022
CNN-Trans-Enc: A CNN-Enhanced Transformer-Encoder On Top Of Static BERT
  representations for Document Classification
CNN-Trans-Enc: A CNN-Enhanced Transformer-Encoder On Top Of Static BERT representations for Document Classification
Charaf Eddine Benarab
Shenglin Gui
27
6
0
13 Sep 2022
DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for
  Text-to-Image Generation
DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation
Mengqi Huang
Zhendong Mao
Penghui Wang
Quang Wang
Yongdong Zhang
36
20
0
03 Sep 2022
Deep Sparse Conformer for Speech Recognition
Deep Sparse Conformer for Speech Recognition
Xianchao Wu
28
2
0
01 Sep 2022
MonaCoBERT: Monotonic attention based ConvBERT for Knowledge Tracing
MonaCoBERT: Monotonic attention based ConvBERT for Knowledge Tracing
Unggi Lee
Yonghyun Park
Yujin Kim
S. Choi
Hyeoncheol Kim
27
7
0
19 Aug 2022
Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex
  Logical Queries
Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries
Xiao Liu
Shiyu Zhao
Kai Su
Yukuo Cen
J. Qiu
Mengdi Zhang
Wei Wu
Yuxiao Dong
Jie Tang
35
57
0
16 Aug 2022
Fast Vocabulary Projection Method via Clustering for Multilingual
  Machine Translation on GPU
Fast Vocabulary Projection Method via Clustering for Multilingual Machine Translation on GPU
Hossam Amer
Young Jin Kim
Mohamed Afify
Hitokazu Matsushita
Hany Awadalla
33
1
0
14 Aug 2022
Exploiting Multiple Sequence Lengths in Fast End to End Training for
  Image Captioning
Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
31
21
0
13 Aug 2022
Language Tokens: A Frustratingly Simple Approach Improves Zero-Shot
  Performance of Multilingual Translation
Language Tokens: A Frustratingly Simple Approach Improves Zero-Shot Performance of Multilingual Translation
Muhammad N. ElNokrashy
Amr Hendy
Mohamed Maher
Mohamed Afify
Hany Awadalla
25
2
0
11 Aug 2022
Towards No.1 in CLUE Semantic Matching Challenge: Pre-trained Language
  Model Erlangshen with Propensity-Corrected Loss
Towards No.1 in CLUE Semantic Matching Challenge: Pre-trained Language Model Erlangshen with Propensity-Corrected Loss
Junjie Wang
Yuxiang Zhang
Ping Yang
Ruyi Gan
25
2
0
05 Aug 2022
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq
  Model
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Saleh Soltan
Shankar Ananthakrishnan
Jack G. M. FitzGerald
Rahul Gupta
Wael Hamza
...
Mukund Sridhar
Fabian Triefenbach
Apurv Verma
Gokhan Tur
Premkumar Natarajan
58
82
0
02 Aug 2022
Unified Normalization for Accelerating and Stabilizing Transformers
Unified Normalization for Accelerating and Stabilizing Transformers
Qiming Yang
Kai Zhang
Chaoxiang Lan
Zhi Yang
Zheyang Li
Wenming Tan
Jun Xiao
Shiliang Pu
23
8
0
02 Aug 2022
Adaptive Gradient Methods at the Edge of Stability
Adaptive Gradient Methods at the Edge of Stability
Jeremy M. Cohen
Behrooz Ghorbani
Shankar Krishnan
Naman Agarwal
Sourabh Medapati
...
Daniel Suo
David E. Cardoze
Zachary Nado
George E. Dahl
Justin Gilmer
ODL
37
51
0
29 Jul 2022
GTrans: Grouping and Fusing Transformer Layers for Neural Machine
  Translation
GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation
Jian Yang
Yuwei Yin
Liqun Yang
Shuming Ma
Haoyang Huang
Dongdong Zhang
Furu Wei
Zhoujun Li
AI4CE
22
16
0
29 Jul 2022
HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein
  Language Model as an Alternative
HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative
Xiaomin Fang
Fan Wang
Lihang Liu
Jingzhou He
Dayong Lin
Yingfei Xiang
Xiaonan Zhang
Hua Wu
Hui Li
Le Song
30
51
0
28 Jul 2022
On Mitigating Hard Clusters for Face Clustering
On Mitigating Hard Clusters for Face Clustering
Yingjie Chen
Huasong Zhong
Chong Chen
Chen Shen
Jianqiang Huang
Tao Wang
Yun Liang
Qianru Sun
CVBM
33
12
0
25 Jul 2022
Neural Topological Ordering for Computation Graphs
Neural Topological Ordering for Computation Graphs
Mukul Gagrani
Corrado Rainone
Yang Yang
Harris Teague
Wonseok Jeon
H. V. Hoof
Weizhen Zeng
P. Zappi
Chris Lott
Roberto Bondesan
40
12
0
13 Jul 2022
Earthformer: Exploring Space-Time Transformers for Earth System
  Forecasting
Earthformer: Exploring Space-Time Transformers for Earth System Forecasting
Zhihan Gao
Xingjian Shi
Hao Wang
Yi Zhu
Yuyang Wang
Mu Li
Dit-Yan Yeung
AI4TS
45
150
0
12 Jul 2022
Deep Transformer Model with Pre-Layer Normalization for COVID-19 Growth
  Prediction
Deep Transformer Model with Pre-Layer Normalization for COVID-19 Growth Prediction
Rizki Ramadhan Fitra
N. Yudistira
W. Mahmudy
27
1
0
10 Jul 2022
Vision Transformers: State of the Art and Research Challenges
Vision Transformers: State of the Art and Research Challenges
Bo-Kai Ruan
Hong-Han Shuai
Wen-Huang Cheng
ViT
30
17
0
07 Jul 2022
Pure Transformers are Powerful Graph Learners
Pure Transformers are Powerful Graph Learners
Jinwoo Kim
Tien Dat Nguyen
Seonwoo Min
Sungjun Cho
Moontae Lee
Honglak Lee
Seunghoon Hong
43
191
0
06 Jul 2022
TENET: Transformer Encoding Network for Effective Temporal Flow on
  Motion Prediction
TENET: Transformer Encoding Network for Effective Temporal Flow on Motion Prediction
Yuting Wang
Hangning Zhou
Zhigang Zhang
Chen Feng
H. Lin
...
Shiyu Zhang
Jie-Ru Guo
Xuefeng Wang
Ziyao Xu
Chi Zhang
ViT
59
15
0
30 Jun 2022
Set Norm and Equivariant Skip Connections: Putting the Deep in Deep Sets
Set Norm and Equivariant Skip Connections: Putting the Deep in Deep Sets
Lily H. Zhang
Veronica Tozzo
J. Higgins
Rajesh Ranganath
BDL
MoE
19
16
0
23 Jun 2022
Agent-based Graph Neural Networks
Agent-based Graph Neural Networks
Karolis Martinkus
Pál András Papp
Benedikt Schesch
Roger Wattenhofer
LLMAG
GNN
39
17
0
22 Jun 2022
All you need is feedback: Communication with block attention feedback
  codes
All you need is feedback: Communication with block attention feedback codes
Emre Ozfatura
Yulin Shao
A. Perotti
B. Popović
Deniz Gunduz
22
10
0
19 Jun 2022
Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter
  Encoders for Natural Language Understanding Systems
Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems
Jack G. M. FitzGerald
Shankar Ananthakrishnan
Konstantine Arkoudas
Davide Bernardi
Abhishek Bhagia
...
Pan Wei
Haiyang Yu
Shuai Zheng
Gokhan Tur
Premkumar Natarajan
ELM
14
30
0
15 Jun 2022
Previous
123...101112789
Next