Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2003.04887
Cited By
ReZero is All You Need: Fast Convergence at Large Depth
10 March 2020
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ReZero is All You Need: Fast Convergence at Large Depth"
22 / 72 papers shown
Title
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
Chen Dun
Cameron R. Wolfe
C. Jermaine
Anastasios Kyrillidis
21
21
0
02 Jul 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
53
1,088
0
08 Jun 2021
Self-Supervised Bug Detection and Repair
Miltiadis Allamanis
Henry Jackson-Flux
Marc Brockschmidt
23
103
0
26 May 2021
Scaling Properties of Deep Residual Networks
A. Cohen
R. Cont
Alain Rossier
Renyuan Xu
25
18
0
25 May 2021
Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs
Jiaao Chen
Diyi Yang
25
97
0
16 Apr 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization
Tianlong Chen
Zhenyu Zhang
Xu Ouyang
Zechun Liu
Zhiqiang Shen
Zhangyang Wang
MQ
43
36
0
16 Apr 2021
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
27
988
0
31 Mar 2021
Predicting the Behavior of Dealers in Over-The-Counter Corporate Bond Markets
Yusen Lin
Jinming Xue
L. Raschid
13
3
0
12 Mar 2021
3D Human Pose, Shape and Texture from Low-Resolution Images and Videos
Xiangyu Xu
Hao Chen
Francesc Moreno-Noguer
László A. Jeni
Fernando de la Torre
3DH
22
35
0
11 Mar 2021
Generating Images with Sparse Representations
C. Nash
Jacob Menick
Sander Dieleman
Peter W. Battaglia
33
200
0
05 Mar 2021
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
Chen Zhu
Renkun Ni
Zheng Xu
Kezhi Kong
Yifan Jiang
Tom Goldstein
ODL
41
53
0
16 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks
Asaf Noy
Yi Tian Xu
Y. Aflalo
Lihi Zelnik-Manor
Rong Jin
41
3
0
12 Jan 2021
Reservoir Transformers
Sheng Shen
Alexei Baevski
Ari S. Morcos
Kurt Keutzer
Michael Auli
Douwe Kiela
35
17
0
30 Dec 2020
On the Transfer of Disentangled Representations in Realistic Settings
Andrea Dittadi
Frederik Trauble
Francesco Locatello
M. Wuthrich
Vaibhav Agrawal
Ole Winther
Stefan Bauer
Bernhard Schölkopf
OOD
35
80
0
27 Oct 2020
IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression
Rianne van den Berg
A. Gritsenko
Mostafa Dehghani
C. Sønderby
Tim Salimans
27
60
0
22 Jun 2020
Normalized Attention Without Probability Cage
Oliver Richter
Roger Wattenhofer
14
21
0
19 May 2020
Speech Recognition and Multi-Speaker Diarization of Long Conversations
H. H. Mao
Shuyang Li
Julian McAuley
G. Cottrell
VLM
22
40
0
16 May 2020
Evolving Normalization-Activation Layers
Hanxiao Liu
Andrew Brock
Karen Simonyan
Quoc V. Le
19
79
0
06 Apr 2020
Set Functions for Time Series
Max Horn
Michael Moor
Christian Bock
Bastian Alexander Rieck
Karsten M. Borgwardt
AI4TS
38
145
0
26 Sep 2019
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
221
1,400
0
04 Dec 2018
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
Lechao Xiao
Yasaman Bahri
Jascha Narain Sohl-Dickstein
S. Schoenholz
Jeffrey Pennington
244
349
0
14 Jun 2018
Previous
1
2