ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.04887
  4. Cited By
ReZero is All You Need: Fast Convergence at Large Depth

ReZero is All You Need: Fast Convergence at Large Depth

10 March 2020
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
    AI4CE
ArXivPDFHTML

Papers citing "ReZero is All You Need: Fast Convergence at Large Depth"

22 / 72 papers shown
Title
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
Chen Dun
Cameron R. Wolfe
C. Jermaine
Anastasios Kyrillidis
21
21
0
02 Jul 2021
A Survey of Transformers
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
53
1,088
0
08 Jun 2021
Self-Supervised Bug Detection and Repair
Self-Supervised Bug Detection and Repair
Miltiadis Allamanis
Henry Jackson-Flux
Marc Brockschmidt
23
103
0
26 May 2021
Scaling Properties of Deep Residual Networks
Scaling Properties of Deep Residual Networks
A. Cohen
R. Cont
Alain Rossier
Renyuan Xu
25
18
0
25 May 2021
Structure-Aware Abstractive Conversation Summarization via Discourse and
  Action Graphs
Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs
Jiaao Chen
Diyi Yang
25
97
0
16 Apr 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch
  Normalization
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization
Tianlong Chen
Zhenyu Zhang
Xu Ouyang
Zechun Liu
Zhiqiang Shen
Zhangyang Wang
MQ
43
36
0
16 Apr 2021
Going deeper with Image Transformers
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
27
988
0
31 Mar 2021
Predicting the Behavior of Dealers in Over-The-Counter Corporate Bond
  Markets
Predicting the Behavior of Dealers in Over-The-Counter Corporate Bond Markets
Yusen Lin
Jinming Xue
L. Raschid
13
3
0
12 Mar 2021
3D Human Pose, Shape and Texture from Low-Resolution Images and Videos
3D Human Pose, Shape and Texture from Low-Resolution Images and Videos
Xiangyu Xu
Hao Chen
Francesc Moreno-Noguer
László A. Jeni
Fernando de la Torre
3DH
22
35
0
11 Mar 2021
Generating Images with Sparse Representations
Generating Images with Sparse Representations
C. Nash
Jacob Menick
Sander Dieleman
Peter W. Battaglia
33
200
0
05 Mar 2021
GradInit: Learning to Initialize Neural Networks for Stable and
  Efficient Training
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
Chen Zhu
Renkun Ni
Zheng Xu
Kezhi Kong
Yifan Jiang
Tom Goldstein
ODL
41
53
0
16 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
A Convergence Theory Towards Practical Over-parameterized Deep Neural
  Networks
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks
Asaf Noy
Yi Tian Xu
Y. Aflalo
Lihi Zelnik-Manor
Rong Jin
41
3
0
12 Jan 2021
Reservoir Transformers
Reservoir Transformers
Sheng Shen
Alexei Baevski
Ari S. Morcos
Kurt Keutzer
Michael Auli
Douwe Kiela
35
17
0
30 Dec 2020
On the Transfer of Disentangled Representations in Realistic Settings
On the Transfer of Disentangled Representations in Realistic Settings
Andrea Dittadi
Frederik Trauble
Francesco Locatello
M. Wuthrich
Vaibhav Agrawal
Ole Winther
Stefan Bauer
Bernhard Schölkopf
OOD
35
80
0
27 Oct 2020
IDF++: Analyzing and Improving Integer Discrete Flows for Lossless
  Compression
IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression
Rianne van den Berg
A. Gritsenko
Mostafa Dehghani
C. Sønderby
Tim Salimans
27
60
0
22 Jun 2020
Normalized Attention Without Probability Cage
Normalized Attention Without Probability Cage
Oliver Richter
Roger Wattenhofer
14
21
0
19 May 2020
Speech Recognition and Multi-Speaker Diarization of Long Conversations
Speech Recognition and Multi-Speaker Diarization of Long Conversations
H. H. Mao
Shuyang Li
Julian McAuley
G. Cottrell
VLM
22
40
0
16 May 2020
Evolving Normalization-Activation Layers
Evolving Normalization-Activation Layers
Hanxiao Liu
Andrew Brock
Karen Simonyan
Quoc V. Le
19
79
0
06 Apr 2020
Set Functions for Time Series
Set Functions for Time Series
Max Horn
Michael Moor
Christian Bock
Bastian Alexander Rieck
Karsten M. Borgwardt
AI4TS
38
145
0
26 Sep 2019
Bag of Tricks for Image Classification with Convolutional Neural
  Networks
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
221
1,400
0
04 Dec 2018
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train
  10,000-Layer Vanilla Convolutional Neural Networks
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
Lechao Xiao
Yasaman Bahri
Jascha Narain Sohl-Dickstein
S. Schoenholz
Jeffrey Pennington
244
349
0
14 Jun 2018
Previous
12