On Layer Normalization in the Transformer Architecture

On Layer Normalization in the Transformer Architecture

12 February 2020

Papers citing "On Layer Normalization in the Transformer Architecture"

16 / 566 papers shown

Title
Neural Temporal Point Processes For Modelling Electronic Health Records Joseph Enguehard Dan Busbridge Adam James Bozson Claire Woodcock Nils Y. Hammerla 26 43 0 27 Jul 2020
Rewiring the Transformer with Depth-Wise LSTMs Hongfei Xu Yang Song Qiuhui Liu Josef van Genabith Deyi Xiong 47 6 0 13 Jul 2020
Rethinking Positional Encoding in Language Pre-training Guolin Ke Di He Tie-Yan Liu 17 292 0 28 Jun 2020
Conditional Set Generation with Transformers Adam R. Kosiorek Hyunjik Kim Danilo Jimenez Rezende 24 40 0 26 Jun 2020
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines Marius Mosbach Maksym Andriushchenko Dietrich Klakow 31 354 0 08 Jun 2020
The Lipschitz Constant of Self-Attention Hyunjik Kim George Papamakarios A. Mnih 14 136 0 08 Jun 2020
Many-to-Many Voice Transformer Network Hirokazu Kameoka Wen-Chin Huang Kou Tanaka Takuhiro Kaneko Nobukatsu Hojo T. Toda ViT 30 30 0 18 May 2020
Language Model Prior for Low-Resource Neural Machine Translation Christos Baziotis Barry Haddow Alexandra Birch 18 53 0 30 Apr 2020
Understanding the Difficulty of Training Transformers Liyuan Liu Xiaodong Liu Jianfeng Gao Weizhu Chen Jiawei Han AI4CE 19 247 0 17 Apr 2020
Longformer: The Long-Document Transformer Iz Beltagy Matthew E. Peters Arman Cohan RALM VLM 33 3,944 0 10 Apr 2020
ReZero is All You Need: Fast Convergence at Large Depth Thomas C. Bachlechner Bodhisattwa Prasad Majumder H. H. Mao G. Cottrell Julian McAuley AI4CE 30 276 0 10 Mar 2020
Transformers without Tears: Improving the Normalization of Self-Attention Toan Q. Nguyen Julian Salazar 50 225 0 14 Oct 2019
Stabilizing Transformers for Reinforcement Learning Emilio Parisotto H. F. Song Jack W. Rae Razvan Pascanu Çağlar Gülçehre ... Aidan Clark Seb Noury M. Botvinick N. Heess R. Hadsell OffRL 22 360 0 13 Oct 2019
Bag of Tricks for Image Classification with Convolutional Neural Networks Tong He Zhi-Li Zhang Hang Zhang Zhongyue Zhang Junyuan Xie Mu Li 224 1,400 0 04 Dec 2018
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks Lechao Xiao Yasaman Bahri Jascha Narain Sohl-Dickstein S. Schoenholz Jeffrey Pennington 244 350 0 14 Jun 2018
OpenNMT: Neural Machine Translation Toolkit Guillaume Klein Yoon Kim Yuntian Deng Vincent Nguyen Jean Senellart Alexander M. Rush 144 119 0 28 May 2018