Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.04745
Cited By
On Layer Normalization in the Transformer Architecture
12 February 2020
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Layer Normalization in the Transformer Architecture"
50 / 566 papers shown
Title
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Victor Agostinelli
Lizhong Chen
27
1
0
17 Apr 2023
M2T: Masking Transformers Twice for Faster Decoding
Fabian Mentzer
E. Agustsson
Michael Tschannen
23
17
0
14 Apr 2023
Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks with Soft-Thresholding
Chunyan Xiong
Meng Lu
Xiaotong Yu
JIAN-PENG Cao
Zhong Chen
D. Guo
X. Qu
MLT
43
0
0
14 Apr 2023
Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers
Awni Altabaa
Taylor Webb
Jonathan D. Cohen
John Lafferty
30
8
0
01 Apr 2023
Scalable, Detailed and Mask-Free Universal Photometric Stereo
Satoshi Ikehata
33
31
0
28 Mar 2023
Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation
Clinton Mo
Kun Hu
Chengjiang Long
Zhiyong Wang
35
12
0
27 Mar 2023
Robotic Packaging Optimization with Reinforcement Learning
E. Drijver
Rodrigo Pérez-Dattari
Jens Kober
Cosimo Della Santina
Zlatan Ajanović
OffRL
23
1
0
26 Mar 2023
It is all Connected: A New Graph Formulation for Spatio-Temporal Forecasting
Lars Odegaard Bentsen
N. Warakagoda
R. Stenbro
P. Engelstad
AI4TS
15
1
0
23 Mar 2023
Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning
Zaid Khan
Yun Fu
VLM
41
12
0
21 Mar 2023
Difficulty in chirality recognition for Transformer architectures learning chemical structures from string
Yasuhiro Yoshikai
T. Mizuno
Shumpei Nemoto
Hiroyuki Kusuhara
22
16
0
21 Mar 2023
Blind Estimation of Audio Processing Graph
Sungho Lee
Jaehyung Park
Seungryeol Paik
Kyogu Lee
25
9
0
15 Mar 2023
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
Fan Bao
Shen Nie
Kaiwen Xue
Chongxuan Li
Shiliang Pu
Yaole Wang
Gang Yue
Yue Cao
Hang Su
Jun Zhu
DiffM
207
151
0
12 Mar 2023
Transcription free filler word detection with Neural semi-CRFs
Ge Zhu
Yujia Yan
Juan-Pablo Caceres
Z. Duan
32
3
0
11 Mar 2023
TSMixer: An All-MLP Architecture for Time Series Forecasting
Si-An Chen
Chun-Liang Li
Nate Yoder
Sercan Ö. Arik
Tomas Pfister
AI4TS
36
157
0
10 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
Yuchen Li
Yuan-Fang Li
Andrej Risteski
120
61
0
07 Mar 2023
TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction
Zhejun Zhang
Alexander Liniger
Dengxin Dai
Feng Yu
Luc Van Gool
82
42
0
07 Mar 2023
Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation
Bobby He
James Martens
Guodong Zhang
Aleksandar Botev
Andy Brock
Samuel L. Smith
Yee Whye Teh
27
30
0
20 Feb 2023
Scaling Laws for Multilingual Neural Machine Translation
Patrick Fernandes
Behrooz Ghorbani
Xavier Garcia
Markus Freitag
Orhan Firat
49
29
0
19 Feb 2023
Eagle: Large-Scale Learning of Turbulent Fluid Dynamics with Mesh Transformers
Steeven Janny
Aurélien Béneteau
Madiha Nadri Wolf
Julie Digne
Nicolas Thome
Christian Wolf
AI4CE
84
32
0
16 Feb 2023
Learning Non-Local Spatial-Angular Correlation for Light Field Image Super-Resolution
Zhengyu Liang
Yingqian Wang
Longguang Wang
Jungang Yang
Shilin Zhou
Y. Guo
42
38
0
16 Feb 2023
Spatial Functa: Scaling Functa to ImageNet Classification and Generation
Matthias Bauer
Emilien Dupont
Andy Brock
Dan Rosenbaum
Jonathan Richard Schwarz
Hyunjik Kim
DiffM
36
35
0
06 Feb 2023
V1T: large-scale mouse V1 response prediction using a Vision Transformer
Bryan M. Li
I. M. Cornacchia
Nathalie L Rochefort
A. Onken
26
8
0
06 Feb 2023
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction
Christopher Fifty
Joseph M. Paggi
Ehsan Amid
J. Leskovec
R. Dror
AI4CE
25
0
0
04 Feb 2023
Dual PatchNorm
Manoj Kumar
Mostafa Dehghani
N. Houlsby
UQCV
ViT
29
11
0
02 Feb 2023
STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition
Yucheng Lu
Shivani Agrawal
Suvinay Subramanian
Oleg Rybakov
Chris De Sa
Amir Yazdanbakhsh
21
16
0
02 Feb 2023
Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps
Goro Kobayashi
Tatsuki Kuribayashi
Sho Yokoi
Kentaro Inui
36
14
0
01 Feb 2023
Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Xiaoxia Wu
Cheng-rong Li
Reza Yazdani Aminabadi
Z. Yao
Yuxiong He
MQ
19
19
0
27 Jan 2023
Deep Quantum Error Correction
Yoni Choukroun
Lior Wolf
27
8
0
27 Jan 2023
Modelling Long Range Dependencies in
N
N
N
D: From Task-Specific to a General Purpose CNN
David M. Knigge
David W. Romero
Albert Gu
E. Gavves
Erik J. Bekkers
Jakub M. Tomczak
Mark Hoogendoorn
J. Sonke
3DV
35
21
0
25 Jan 2023
Image Super-Resolution using Efficient Striped Window Transformer
Jinpeng Shi
Hui Li
Tian Yu Liu
Yulong Liu
Hao Fei
Jinchen Zhu
Ling Zheng
Shizhuang Weng
42
10
0
24 Jan 2023
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Floris Weers
Vaishaal Shankar
Angelos Katharopoulos
Yinfei Yang
Tom Gunter
CLIP
23
4
0
19 Jan 2023
SPTS v2: Single-Point Scene Text Spotting
Yuliang Liu
Jiaxin Zhang
Dezhi Peng
Mingxin Huang
Xinyu Wang
...
Can Huang
Dahua Lin
Chunhua Shen
Xiang Bai
Lianwen Jin
VLM
34
50
0
04 Jan 2023
Edge Enhanced Image Style Transfer via Transformers
Chi Zhang
Jun Yang
Zaiyan Dai
Peng-Xia Cao
16
10
0
02 Jan 2023
Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person Re-identification
Ziyi Tang
Ruimao Zhang
Zhanglin Peng
Jinrui Chen
Liang Lin
33
18
0
02 Jan 2023
On Transforming Reinforcement Learning by Transformer: The Development Trajectory
Shengchao Hu
Li Shen
Ya Zhang
Yixin Chen
Dacheng Tao
OffRL
30
25
0
29 Dec 2022
Cramming: Training a Language Model on a Single GPU in One Day
Jonas Geiping
Tom Goldstein
MoE
32
86
0
28 Dec 2022
On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective
Ying Wen
Bo Liu
M. Zhou
Shufang Hou
Zhe Cao
Chenyang Le
Jingxiao Chen
Zheng Tian
Weinan Zhang
Jun Wang
AI4CE
26
10
0
24 Dec 2022
Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation
Wenjie Hao
Hongfei Xu
Lingling Mu
Hongying Zan
MoE
38
4
0
24 Dec 2022
Generative Colorization of Structured Mobile Web Pages
Kotaro Kikuchi
Naoto Inoue
Mayu Otani
E. Simo-Serra
Kota Yamaguchi
10
9
0
22 Dec 2022
What Makes for Good Tokenizers in Vision Transformer?
Shengju Qian
Yi Zhu
Wenbo Li
Mu Li
Jiaya Jia
ViT
37
14
0
21 Dec 2022
SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations
Ioannis Tsiamas
José A. R. Fonollosa
Marta R. Costa-jussá
46
6
0
19 Dec 2022
Latent Diffusion for Language Generation
Justin Lovelace
Varsha Kishore
Chao-gang Wan
Eliot Shekhtman
Kilian Q. Weinberger
DiffM
29
71
0
19 Dec 2022
Inductive Attention for Video Action Anticipation
Tsung-Ming Tai
G. Fiameni
Cheng-Kuang Lee
Simon See
Oswald Lanz
39
1
0
17 Dec 2022
Efficient Long Sequence Modeling via State Space Augmented Transformer
Simiao Zuo
Xiaodong Liu
Jian Jiao
Denis Xavier Charles
Eren Manavoglu
Tuo Zhao
Jianfeng Gao
130
36
0
15 Dec 2022
Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual Machine Translation
Maha Elbayad
Anna Y. Sun
Shruti Bhosale
MoE
59
9
0
15 Dec 2022
Gaussian Radar Transformer for Semantic Segmentation in Noisy Radar Data
Matthias Zeller
Jens Behley
Michael Heidingsfeld
C. Stachniss
37
24
0
07 Dec 2022
Improve Bilingual TTS Using Dynamic Language and Phonology Embedding
Fengyu Yang
Jian Luan
Yujun Wang
21
1
0
07 Dec 2022
Cross-lingual Similarity of Multilingual Representations Revisited
Maksym Del
Mark Fishel
31
3
0
04 Dec 2022
Simplifying and Understanding State Space Models with Diagonal Linear RNNs
Ankit Gupta
Harsh Mehta
Jonathan Berant
29
21
0
01 Dec 2022
Continuous diffusion for categorical data
Sander Dieleman
Laurent Sartran
Arman Roshannai
Nikolay Savinov
Yaroslav Ganin
...
Conor Durkan
Curtis Hawthorne
Rémi Leblond
Will Grathwohl
J. Adler
DiffM
32
100
0
28 Nov 2022
Previous
1
2
3
...
6
7
8
...
10
11
12
Next