Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.04745
Cited By
On Layer Normalization in the Transformer Architecture
12 February 2020
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Layer Normalization in the Transformer Architecture"
50 / 566 papers shown
Title
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments
Spyros Gidaris
Andrei Bursuc
Oriane Siméoni
Antonín Vobecký
N. Komodakis
Matthieu Cord
Patrick Pérez
SSL
ViT
24
3
0
18 Jul 2023
PolyLM: An Open Source Polyglot Large Language Model
Xiangpeng Wei
Hao-Ran Wei
Huan Lin
Tianhao Li
Pei Zhang
...
Yu Bowen
Dayiheng Liu
Baosong Yang
Fei Huang
Jun Xie
LRM
48
55
0
12 Jul 2023
MeT: A Graph Transformer for Semantic Segmentation of 3D Meshes
Giuseppe Vecchio
Luca Prezzavento
C. Pino
Francesco Rundo
S. Palazzo
C. Spampinato
35
5
0
03 Jul 2023
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit
Lorenzo Noci
Chuning Li
Mufan Li
Bobby He
Thomas Hofmann
Chris J. Maddison
Daniel M. Roy
40
31
0
30 Jun 2023
BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models
Phuoc-Hoan Charles Le
Xinlin Li
ViT
MQ
33
21
0
29 Jun 2023
Reconstructing the Hemodynamic Response Function via a Bimodal Transformer
Yoni Choukroun
Lior Golgher
P. Blinder
L. Wolf
MedIm
24
0
0
28 Jun 2023
Understanding Optimization of Deep Learning via Jacobian Matrix and Lipschitz Constant
Xianbiao Qi
Jianan Wang
Lei Zhang
21
0
0
15 Jun 2023
Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models
Saleh Soltan
Andrew Rosenbaum
Tobias Falke
Qin Lu
Anna Rumshisky
Wael Hamza
30
0
0
14 Jun 2023
AutoML in the Age of Large Language Models: Current Challenges, Future Opportunities and Risks
Alexander Tornede
Difan Deng
Theresa Eimer
Joseph Giovanelli
Aditya Mohan
...
Sarah Segel
Daphne Theodorakopoulos
Tanja Tornede
Henning Wachsmuth
Marius Lindauer
41
23
0
13 Jun 2023
EventCLIP: Adapting CLIP for Event-based Object Recognition
Ziyi Wu
Xudong Liu
Igor Gilitschenski
VLM
37
15
0
10 Jun 2023
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
38
1
0
07 Jun 2023
Deep Learning for Day Forecasts from Sparse Observations
Marcin Andrychowicz
L. Espeholt
Di Li
Samier Merchant
Alexander Merose
Fred Zyda
Shreya Agrawal
Nal Kalchbrenner
AI4Cl
36
63
0
06 Jun 2023
Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance
Jinwoo Kim
Tien Dat Nguyen
Ayhan Suleymanzade
Hyeokjun An
Seunghoon Hong
55
23
0
05 Jun 2023
Centered Self-Attention Layers
Ameen Ali
Tomer Galanti
Lior Wolf
51
6
0
02 Jun 2023
Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23
Ioannis Tsiamas
Gerard I. Gállego
José A. R. Fonollosa
Marta R. Costa-jussá
OT
16
3
0
02 Jun 2023
Learning Sampling Dictionaries for Efficient and Generalizable Robot Motion Planning with Transformers
Jacob J. Johnson
A. H. Qureshi
Michael C. Yip
44
13
0
01 Jun 2023
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression
Runtian Zhai
Bing Liu
Andrej Risteski
Zico Kolter
Pradeep Ravikumar
SSL
40
9
0
01 Jun 2023
Normalization Enhances Generalization in Visual Reinforcement Learning
Lu Li
Jiafei Lyu
Guozheng Ma
Zilin Wang
Zhen Yang
Xiu Li
Zhiheng Li
OOD
30
8
0
01 Jun 2023
From Zero to Turbulence: Generative Modeling for 3D Flow Simulation
Marten Lienen
David Lüdke
Jan Hansen-Palmus
Stephan Günnemann
DiffM
AI4CE
34
25
0
29 May 2023
Geometric Algebra Transformer
Johann Brehmer
P. D. Haan
S. Behrends
Taco S. Cohen
46
27
0
28 May 2023
On the impact of activation and normalization in obtaining isometric embeddings at initialization
Amir Joudaki
Hadi Daneshmand
Francis R. Bach
21
9
0
28 May 2023
SING: A Plug-and-Play DNN Learning Technique
Adrien Courtois
Damien Scieur
Jean-Michel Morel
Pablo Arias
Thomas Eboli
41
0
0
25 May 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
42
53
0
25 May 2023
Let There Be Order: Rethinking Ordering in Autoregressive Graph Generation
Jie Bu
Kazi Sajeed Mehrab
Anuj Karpatne
29
3
0
24 May 2023
Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and Efficient Pre-LN Transformers
Zixuan Jiang
Jiaqi Gu
Hanqing Zhu
David Z. Pan
AI4CE
33
16
0
24 May 2023
On Structural Expressive Power of Graph Transformers
Wenhao Zhu
Tianyu Wen
Guojie Song
Liangji Wang
Bo Zheng
27
15
0
23 May 2023
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Ta-Chung Chi
Ting-Han Fan
Li-Wei Chen
Alexander I. Rudnicky
Peter J. Ramadge
VLM
MILM
60
12
0
23 May 2023
U-TILISE: A Sequence-to-sequence Model for Cloud Removal in Optical Satellite Time Series
Corinne Stucker
Vivien Sainte Fare Garnot
Konrad Schindler
AI4TS
24
13
0
22 May 2023
Learning Subpocket Prototypes for Generalizable Structure-based Drug Design
Zaixin Zhang
Qi Liu
40
34
0
22 May 2023
Tokenized Graph Transformer with Neighborhood Augmentation for Node Classification in Large Graphs
Jinsong Chen
Chang-Shu Liu
Kai-Xin Gao
Gaichao Li
Kun He
31
4
0
22 May 2023
Duplex Diffusion Models Improve Speech-to-Speech Translation
Xianchao Wu
DiffM
25
4
0
22 May 2023
Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation
Zhuoyuan Mao
Raj Dabre
Qianying Liu
Haiyue Song
Chenhui Chu
Sadao Kurohashi
19
7
0
16 May 2023
Exploiting Fine-Grained DCT Representations for Hiding Image-Level Messages within JPEG Images
Junxue Yang
Xin Liao
28
5
0
11 May 2023
Evaluating Embedding APIs for Information Retrieval
Ehsan Kamalloo
Xinyu Crystina Zhang
Odunayo Ogundepo
Nandan Thakur
David Alfonso-Hermelo
Mehdi Rezagholizadeh
Jimmy J. Lin
RALM
31
19
0
10 May 2023
Multi-Path Transformer is Better: A Case Study on Neural Machine Translation
Ye Lin
Shuhan Zhou
Yanyang Li
Anxiang Ma
Tong Xiao
Jingbo Zhu
38
0
0
10 May 2023
What is the best recipe for character-level encoder-only modelling?
Kris Cao
42
2
0
09 May 2023
Coherent Wave Dynamics and Language Generation of a Generative Pre-trained Transformer
Tao Hong
19
0
0
08 May 2023
Leveraging Synthetic Targets for Machine Translation
Sarthak Mittal
Oleksii Hrinchuk
Oleksii Kuchaiev
35
2
0
07 May 2023
Spatiotemporal Transformer for Stock Movement Prediction
Daniel Boyle
Jugal Kalita
AI4TS
23
2
0
05 May 2023
On the Expressivity Role of LayerNorm in Transformers' Attention
Shaked Brody
Shiyu Jin
Xinghao Zhu
MoE
72
31
0
04 May 2023
Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity
Da Xu
Maha Elbayad
Kenton W. Murray
Jean Maillard
Vedanuj Goswami
MoE
47
3
0
03 May 2023
HappyQuokka System for ICASSP 2023 Auditory EEG Challenge
Zhenyu Piao
Miseul Kim
Hyungchan Yoon
Hong-Goo Kang
17
6
0
03 May 2023
ResiDual: Transformer with Dual Residual Connections
Shufang Xie
Huishuai Zhang
Junliang Guo
Xu Tan
Jiang Bian
Hany Awadalla
Arul Menezes
Tao Qin
Rui Yan
51
18
0
28 Apr 2023
Customized Segment Anything Model for Medical Image Segmentation
Kaiwen Zhang
Dong Liu
MedIm
VLM
106
289
0
26 Apr 2023
StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos
Nikita Dvornik
Isma Hadji
Ran Zhang
Konstantinos G. Derpanis
Animesh Garg
Richard P. Wildes
Allan D. Jepson
34
27
0
26 Apr 2023
Application of Transformers for Nonlinear Channel Compensation in Optical Systems
Behnam Behinaein Hamgini
H. Najafi
Ali Bakhshali
Zhuhong Zhang
31
1
0
25 Apr 2023
DuETT: Dual Event Time Transformer for Electronic Health Records
Alex Labach
Aslesha Pokhrel
Xiao Shi Huang
S. Zuberi
S. Yi
M. Volkovs
T. Poutanen
Rahul G. Krishnan
AI4TS
MedIm
28
3
0
25 Apr 2023
NoiseTrans: Point Cloud Denoising with Transformers
Guangzhe Hou
G. Qin
Minghui Sun
Yanhua Liang
Jie Yan
Zhonghan Zhang
3DPC
ViT
23
2
0
24 Apr 2023
An Introduction to Transformers
Richard Turner
ViT
28
0
0
20 Apr 2023
LipsFormer: Introducing Lipschitz Continuity to Vision Transformers
Xianbiao Qi
Jianan Wang
Yihao Chen
Yukai Shi
Lei Zhang
46
17
0
19 Apr 2023
Previous
1
2
3
...
5
6
7
...
10
11
12
Next