ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.04745
  4. Cited By
On Layer Normalization in the Transformer Architecture

On Layer Normalization in the Transformer Architecture

12 February 2020
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
    AI4CE
ArXivPDFHTML

Papers citing "On Layer Normalization in the Transformer Architecture"

50 / 566 papers shown
Title
Learned Queries for Efficient Local Attention
Learned Queries for Efficient Local Attention
Moab Arar
Ariel Shamir
Amit H. Bermano
ViT
44
29
0
21 Dec 2021
Feature Erasing and Diffusion Network for Occluded Person
  Re-Identification
Feature Erasing and Diffusion Network for Occluded Person Re-Identification
Zhikang Wang
Feng Zhu
Shixiang Tang
Rui Zhao
Lihuo He
Jiangning Song
DiffM
35
104
0
16 Dec 2021
Faster Nearest Neighbor Machine Translation
Faster Nearest Neighbor Machine Translation
Shuhe Wang
Jiwei Li
Yuxian Meng
Rongbin Ouyang
Guoyin Wang
Xiaoya Li
Tianwei Zhang
Shi Zong
22
12
0
15 Dec 2021
SPTS: Single-Point Text Spotting
SPTS: Single-Point Text Spotting
Dezhi Peng
Xinyu Wang
Yuliang Liu
Jiaxin Zhang
Mingxin Huang
...
Jing Li
Dahua Lin
Chunhua Shen
Xiang Bai
Lianwen Jin
ViT
32
63
0
15 Dec 2021
Simple Local Attentions Remain Competitive for Long-Context Tasks
Simple Local Attentions Remain Competitive for Long-Context Tasks
Wenhan Xiong
Barlas Ouguz
Anchit Gupta
Xilun Chen
Diana Liskovich
Omer Levy
Wen-tau Yih
Yashar Mehdad
49
29
0
14 Dec 2021
FLAVA: A Foundational Language And Vision Alignment Model
FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh
Ronghang Hu
Vedanuj Goswami
Guillaume Couairon
Wojciech Galuba
Marcus Rohrbach
Douwe Kiela
CLIP
VLM
40
690
0
08 Dec 2021
UniLog: Deploy One Model and Specialize it for All Log Analysis Tasks
UniLog: Deploy One Model and Specialize it for All Log Analysis Tasks
Yichen Zhu
Weibin Meng
Ying Liu
Shenglin Zhang
Tao Han
Shimin Tao
Dan Pei
MoE
46
14
0
06 Dec 2021
Dynamic Graph Learning-Neural Network for Multivariate Time Series
  Modeling
Dynamic Graph Learning-Neural Network for Multivariate Time Series Modeling
Zhuoling Li
Gaowei Zhang
Lingyu Xu
Jie Yu
AI4TS
21
2
0
06 Dec 2021
CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object
  Detection
CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection
Youwei Pang
Xiaoqi Zhao
Lihe Zhang
Huchuan Lu
50
93
0
04 Dec 2021
PTCT: Patches with 3D-Temporal Convolutional Transformer Network for
  Precipitation Nowcasting
PTCT: Patches with 3D-Temporal Convolutional Transformer Network for Precipitation Nowcasting
Ziao Yang
Xiangru Yang
Qifeng Lin
ViT
AI4TS
24
4
0
02 Dec 2021
Critical Initialization of Wide and Deep Neural Networks through Partial
  Jacobians: General Theory and Applications
Critical Initialization of Wide and Deep Neural Networks through Partial Jacobians: General Theory and Applications
Darshil Doshi
Tianyu He
Andrey Gromov
30
8
0
23 Nov 2021
RedCaps: web-curated image-text data created by the people, for the
  people
RedCaps: web-curated image-text data created by the people, for the people
Karan Desai
Gaurav Kaul
Zubin Aysola
Justin Johnson
31
162
0
22 Nov 2021
Grounded Situation Recognition with Transformers
Grounded Situation Recognition with Transformers
Junhyeong Cho
Youngseok Yoon
Hyeonjun Lee
Suha Kwak
ViT
31
17
0
19 Nov 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng-Wei Zhang
Li Dong
Furu Wei
B. Guo
ViT
91
1,758
0
18 Nov 2021
NVIDIA NeMo Neural Machine Translation Systems for English-German and
  English-Russian News and Biomedical Tasks at WMT21
NVIDIA NeMo Neural Machine Translation Systems for English-German and English-Russian News and Biomedical Tasks at WMT21
Sandeep Subramanian
Oleksii Hrinchuk
Virginia Adams
Oleksii Kuchaiev
VLM
27
16
0
16 Nov 2021
Scaling ASR Improves Zero and Few Shot Learning
Scaling ASR Improves Zero and Few Shot Learning
Alex Xiao
Weiyi Zheng
Gil Keren
Duc Le
Frank Zhang
Christian Fuegen
Ozlem Kalinli
Yatharth Saraf
Abdel-rahman Mohamed
19
21
0
10 Nov 2021
Attention Approximates Sparse Distributed Memory
Attention Approximates Sparse Distributed Memory
Trenton Bricken
Cengiz Pehlevan
35
34
0
10 Nov 2021
Can Vision Transformers Perform Convolution?
Can Vision Transformers Perform Convolution?
Shanda Li
Xiangning Chen
Di He
Cho-Jui Hsieh
ViT
49
19
0
02 Nov 2021
Egocentric Human Trajectory Forecasting with a Wearable Camera and
  Multi-Modal Fusion
Egocentric Human Trajectory Forecasting with a Wearable Camera and Multi-Modal Fusion
Jianing Qiu
Lipeng Chen
Xiao Gu
Frank P.-W. Lo
Ya-Yen Tsai
Jiankai Sun
Jiaqi Liu
Benny Lo
30
15
0
01 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
32
14
0
01 Nov 2021
Transformers Generalize DeepSets and Can be Extended to Graphs and
  Hypergraphs
Transformers Generalize DeepSets and Can be Extended to Graphs and Hypergraphs
Jinwoo Kim
Saeyoon Oh
Seunghoon Hong
AI4CE
22
41
0
27 Oct 2021
Geometric Transformer for End-to-End Molecule Properties Prediction
Geometric Transformer for End-to-End Molecule Properties Prediction
Yoni Choukroun
Lior Wolf
AI4CE
ViT
30
16
0
26 Oct 2021
NormFormer: Improved Transformer Pretraining with Extra Normalization
NormFormer: Improved Transformer Pretraining with Extra Normalization
Sam Shleifer
Jason Weston
Myle Ott
AI4CE
33
74
0
18 Oct 2021
SpecTNT: a Time-Frequency Transformer for Music Audio
SpecTNT: a Time-Frequency Transformer for Music Audio
Weiyi Lu
Ju-Chiang Wang
Minz Won
Keunwoo Choi
Xuchen Song
ViT
25
45
0
18 Oct 2021
bert2BERT: Towards Reusable Pretrained Language Models
bert2BERT: Towards Reusable Pretrained Language Models
Cheng Chen
Yichun Yin
Lifeng Shang
Xin Jiang
Yujia Qin
Fengyu Wang
Zhi Wang
Xiao Chen
Zhiyuan Liu
Qun Liu
VLM
24
59
0
14 Oct 2021
Poformer: A simple pooling transformer for speaker verification
Poformer: A simple pooling transformer for speaker verification
Yufeng Ma
Yiwei Ding
Miao Zhao
Yu Zheng
Min Liu
Minqiang Xu
ViT
21
2
0
10 Oct 2021
A Loss Curvature Perspective on Training Instability in Deep Learning
A Loss Curvature Perspective on Training Instability in Deep Learning
Justin Gilmer
Behrooz Ghorbani
Ankush Garg
Sneha Kudugunta
Behnam Neyshabur
David E. Cardoze
George E. Dahl
Zachary Nado
Orhan Firat
ODL
36
35
0
08 Oct 2021
M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion
  Parameter Pretraining
M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining
Junyang Lin
An Yang
Jinze Bai
Chang Zhou
Le Jiang
...
Jie Zhang
Yong Li
Wei Lin
Jingren Zhou
Hongxia Yang
MoE
92
43
0
08 Oct 2021
Speeding up Deep Model Training by Sharing Weights and Then Unsharing
Speeding up Deep Model Training by Sharing Weights and Then Unsharing
Shuo Yang
Le Hou
Xiaodan Song
Qiang Liu
Denny Zhou
110
9
0
08 Oct 2021
Learning Pessimism for Robust and Efficient Off-Policy Reinforcement
  Learning
Learning Pessimism for Robust and Efficient Off-Policy Reinforcement Learning
Edoardo Cetin
Oya Celiktutan
OffRL
44
17
0
07 Oct 2021
Long-Range Transformers for Dynamic Spatiotemporal Forecasting
Long-Range Transformers for Dynamic Spatiotemporal Forecasting
J. E. Grigsby
Zhe Wang
Nam Nguyen
Yanjun Qi
AI4TS
69
88
0
24 Sep 2021
Scalable and Efficient MoE Training for Multitask Multilingual Models
Scalable and Efficient MoE Training for Multitask Multilingual Models
Young Jin Kim
A. A. Awan
Alexandre Muzio
Andres Felipe Cruz Salinas
Liyang Lu
Amr Hendy
Samyam Rajbhandari
Yuxiong He
Hany Awadalla
MoE
104
84
0
22 Sep 2021
Primer: Searching for Efficient Transformers for Language Modeling
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
91
153
0
17 Sep 2021
An Interpretable Framework for Drug-Target Interaction with Gated Cross
  Attention
An Interpretable Framework for Drug-Target Interaction with Gated Cross Attention
Yeachan Kim
Bonggun Shin
24
8
0
17 Sep 2021
Scaling Laws for Neural Machine Translation
Scaling Laws for Neural Machine Translation
Behrooz Ghorbani
Orhan Firat
Markus Freitag
Ankur Bapna
M. Krikun
Xavier Garcia
Ciprian Chelba
Colin Cherry
40
99
0
16 Sep 2021
RankNAS: Efficient Neural Architecture Search by Pairwise Ranking
RankNAS: Efficient Neural Architecture Search by Pairwise Ranking
Chi Hu
Chenglong Wang
Xiangnan Ma
Xia Meng
Yinqiao Li
Tong Xiao
Jingbo Zhu
Changliang Li
28
11
0
15 Sep 2021
Incorporating Residual and Normalization Layers into Analysis of Masked
  Language Models
Incorporating Residual and Normalization Layers into Analysis of Masked Language Models
Goro Kobayashi
Tatsuki Kuribayashi
Sho Yokoi
Kentaro Inui
160
46
0
15 Sep 2021
Multilingual Translation via Grafting Pre-trained Language Models
Multilingual Translation via Grafting Pre-trained Language Models
Zewei Sun
Mingxuan Wang
Lei Li
AI4CE
191
22
0
11 Sep 2021
Temporal Pyramid Transformer with Multimodal Interaction for Video
  Question Answering
Temporal Pyramid Transformer with Multimodal Interaction for Video Question Answering
Min Peng
Chongyang Wang
Yuan Gao
Yu Shi
Xiangdong Zhou
50
3
0
10 Sep 2021
Competence-based Curriculum Learning for Multilingual Machine
  Translation
Competence-based Curriculum Learning for Multilingual Machine Translation
Mingliang Zhang
Fandong Meng
Y. Tong
Jie Zhou
39
16
0
09 Sep 2021
Searching for Efficient Multi-Stage Vision Transformers
Searching for Efficient Multi-Stage Vision Transformers
Yi-Lun Liao
S. Karaman
Vivienne Sze
ViT
24
19
0
01 Sep 2021
Type Anywhere You Want: An Introduction to Invisible Mobile Keyboard
Type Anywhere You Want: An Introduction to Invisible Mobile Keyboard
Sahng-Min Yoo
Ue-Hwan Kim
Yewon Hwang
Jong-Hwan Kim
OffRL
22
1
0
20 Aug 2021
Global Self-Attention as a Replacement for Graph Convolution
Global Self-Attention as a Replacement for Graph Convolution
Md Shamim Hussain
Mohammed J Zaki
D. Subramanian
ViT
40
123
0
07 Aug 2021
Learning to Elect
Learning to Elect
Cem Anil
Xuchan Bao
6
7
0
05 Aug 2021
WeChat Neural Machine Translation Systems for WMT21
WeChat Neural Machine Translation Systems for WMT21
Xianfeng Zeng
Yanjun Liu
Ernan Li
Qiu Ran
Fandong Meng
Peng Li
Jinan Xu
Jie Zhou
25
20
0
05 Aug 2021
How much pre-training is enough to discover a good subnetwork?
How much pre-training is enough to discover a good subnetwork?
Cameron R. Wolfe
Fangshuo Liao
Qihan Wang
J. Kim
Anastasios Kyrillidis
30
3
0
31 Jul 2021
Learning Attributed Graph Representations with Communicative Message
  Passing Transformer
Learning Attributed Graph Representations with Communicative Message Passing Transformer
Jianwen Chen
Shuangjia Zheng
Ying Song
Jiahua Rao
Yuedong Yang
30
47
0
19 Jul 2021
Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding
Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding
Hongning Zhu
Kong Aik Lee
Haizhou Li
33
15
0
14 Jul 2021
The Piano Inpainting Application
The Piano Inpainting Application
Gaëtan Hadjeres
Léopold Crestel
42
36
0
13 Jul 2021
The Brownian motion in the transformer model
The Brownian motion in the transformer model
Yingshi Chen
21
1
0
12 Jul 2021
Previous
123...1011129
Next