ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.08098
  4. Cited By
GradInit: Learning to Initialize Neural Networks for Stable and
  Efficient Training

GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training

16 February 2021
Chen Zhu
Renkun Ni
Zheng Xu
Kezhi Kong
W. R. Huang
Tom Goldstein
    ODL
ArXivPDFHTML

Papers citing "GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training"

38 / 38 papers shown
Title
Initialization of Large Language Models via Reparameterization to
  Mitigate Loss Spikes
Initialization of Large Language Models via Reparameterization to Mitigate Loss Spikes
Kosuke Nishida
Kyosuke Nishida
Kuniko Saito
28
1
0
07 Oct 2024
Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme
Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme
Johnny Jingze Li
V. George
Gabriel A. Silva
ODL
39
0
0
26 Jul 2024
Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing
Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing
Zhongwang Zhang
Pengxiao Lin
Zhiwei Wang
Yaoyu Zhang
Z. Xu
39
3
0
08 May 2024
Principled Architecture-aware Scaling of Hyperparameters
Principled Architecture-aware Scaling of Hyperparameters
Wuyang Chen
Junru Wu
Zhangyang Wang
Boris Hanin
AI4CE
36
1
0
27 Feb 2024
Transferring Core Knowledge via Learngenes
Transferring Core Knowledge via Learngenes
Fu Feng
Jing Wang
Xin Geng
48
5
0
16 Jan 2024
Each Test Image Deserves A Specific Prompt: Continual Test-Time
  Adaptation for 2D Medical Image Segmentation
Each Test Image Deserves A Specific Prompt: Continual Test-Time Adaptation for 2D Medical Image Segmentation
Ziyang Chen
Yongsheng Pan
Yiwen Ye
Mengkang Lu
Yong-quan Xia
OOD
VLM
MedIm
35
5
0
30 Nov 2023
PVG: Progressive Vision Graph for Vision Recognition
PVG: Progressive Vision Graph for Vision Recognition
Jiafu Wu
Jian Li
Jiangning Zhang
Boshen Zhang
M. Chi
Yabiao Wang
Chengjie Wang
ViT
15
12
0
01 Aug 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
20
41
0
12 Jul 2023
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large
  Language Models
H2_22​O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu (Allen) Zhang
Ying Sheng
Tianyi Zhou
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
47
252
0
24 Jun 2023
BranchNorm: Robustly Scaling Extremely Deep Transformers
BranchNorm: Robustly Scaling Extremely Deep Transformers
Yanjun Liu
Xianfeng Zeng
Fandong Meng
Jie Zhou
27
3
0
04 May 2023
Learngene: Inheriting Condensed Knowledge from the Ancestry Model to
  Descendant Models
Learngene: Inheriting Condensed Knowledge from the Ancestry Model to Descendant Models
Qiufeng Wang
Xu Yang
Shuxia Lin
Jing Wang
Xin Geng
23
10
0
03 May 2023
The Disharmony between BN and ReLU Causes Gradient Explosion, but is
  Offset by the Correlation between Activations
The Disharmony between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation between Activations
Inyoung Paik
Jaesik Choi
8
0
0
23 Apr 2023
Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks
  with Soft-Thresholding
Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks with Soft-Thresholding
Chunyan Xiong
Meng Lu
Xiaotong Yu
JIAN-PENG Cao
Zhong Chen
D. Guo
X. Qu
MLT
35
0
0
14 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
30
40
0
07 Apr 2023
Can We Scale Transformers to Predict Parameters of Diverse ImageNet
  Models?
Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?
Boris Knyazev
Doha Hwang
Simon Lacoste-Julien
AI4CE
24
17
0
07 Mar 2023
Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable
  Transformers
Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers
Tianlong Chen
Zhenyu (Allen) Zhang
Ajay Jaiswal
Shiwei Liu
Zhangyang Wang
MoE
25
46
0
02 Mar 2023
Learning to Grow Pretrained Models for Efficient Transformer Training
Learning to Grow Pretrained Models for Efficient Transformer Training
Peihao Wang
Rameswar Panda
Lucas Torroba Hennigen
P. Greengard
Leonid Karlinsky
Rogerio Feris
David D. Cox
Zhangyang Wang
Yoon Kim
36
53
0
02 Mar 2023
Singular value decomposition based matrix surgery
Singular value decomposition based matrix surgery
Jehan Ghafuri
S. Jassim
9
0
0
22 Feb 2023
CyclicFL: A Cyclic Model Pre-Training Approach to Efficient Federated
  Learning
CyclicFL: A Cyclic Model Pre-Training Approach to Efficient Federated Learning
Peng Zhang
Yingbo Zhou
Ming Hu
Xin Fu
Xian Wei
Mingsong Chen
FedML
24
1
0
28 Jan 2023
Cramming: Training a Language Model on a Single GPU in One Day
Cramming: Training a Language Model on a Single GPU in One Day
Jonas Geiping
Tom Goldstein
MoE
28
84
0
28 Dec 2022
NAR-Former: Neural Architecture Representation Learning towards Holistic
  Attributes Prediction
NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction
Yun Yi
Haokui Zhang
Wenze Hu
Nannan Wang
Xiaoyu Wang
AI4TS
AI4CE
24
8
0
15 Nov 2022
MetaFormer Baselines for Vision
MetaFormer Baselines for Vision
Weihao Yu
Chenyang Si
Pan Zhou
Mi Luo
Yichen Zhou
Jiashi Feng
Shuicheng Yan
Xinchao Wang
MoE
34
156
0
24 Oct 2022
How to Train Vision Transformer on Small-scale Datasets?
How to Train Vision Transformer on Small-scale Datasets?
Hanan Gani
Muzammal Naseer
Mohammad Yaqub
ViT
12
49
0
13 Oct 2022
Towards Theoretically Inspired Neural Initialization Optimization
Towards Theoretically Inspired Neural Initialization Optimization
Yibo Yang
Hong Wang
Haobo Yuan
Zhouchen Lin
16
9
0
12 Oct 2022
Dynamical Isometry for Residual Networks
Dynamical Isometry for Residual Networks
Advait Gadhikar
R. Burkholz
ODL
AI4CE
37
2
0
05 Oct 2022
Hyper-Representations as Generative Models: Sampling Unseen Neural
  Network Weights
Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights
Konstantin Schurholt
Boris Knyazev
Xavier Giró-i-Nieto
Damian Borth
50
38
0
29 Sep 2022
Pretraining a Neural Network before Knowing Its Architecture
Pretraining a Neural Network before Knowing Its Architecture
Boris Knyazev
AI4CE
14
1
0
20 Jul 2022
Training Your Sparse Neural Network Better with Any Mask
Training Your Sparse Neural Network Better with Any Mask
Ajay Jaiswal
Haoyu Ma
Tianlong Chen
Ying Ding
Zhangyang Wang
CVBM
18
35
0
26 Jun 2022
From Deterioration to Acceleration: A Calibration Approach to
  Rehabilitating Step Asynchronism in Federated Optimization
From Deterioration to Acceleration: A Calibration Approach to Rehabilitating Step Asynchronism in Federated Optimization
Feijie Wu
Song Guo
Haozhao Wang
Zhihao Qu
Haobo Zhang
Jiewei Zhang
Ziming Liu
17
11
0
17 Dec 2021
Parameter Prediction for Unseen Deep Architectures
Parameter Prediction for Unseen Deep Architectures
Boris Knyazev
M. Drozdzal
Graham W. Taylor
Adriana Romero Soriano
OOD
22
78
0
25 Oct 2021
NormFormer: Improved Transformer Pretraining with Extra Normalization
NormFormer: Improved Transformer Pretraining with Extra Normalization
Sam Shleifer
Jason Weston
Myle Ott
AI4CE
28
74
0
18 Oct 2021
A Loss Curvature Perspective on Training Instability in Deep Learning
A Loss Curvature Perspective on Training Instability in Deep Learning
Justin Gilmer
Behrooz Ghorbani
Ankush Garg
Sneha Kudugunta
Behnam Neyshabur
David E. Cardoze
George E. Dahl
Zachary Nado
Orhan Firat
ODL
31
35
0
08 Oct 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural
  Networks
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks
G. Bingham
Risto Miikkulainen
ODL
24
4
0
18 Sep 2021
The Devil is in the Detail: Simple Tricks Improve Systematic
  Generalization of Transformers
The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
ViT
23
128
0
26 Aug 2021
Data-driven Weight Initialization with Sylvester Solvers
Data-driven Weight Initialization with Sylvester Solvers
Debasmit Das
Yash Bhalgat
Fatih Porikli
ODL
25
3
0
02 May 2021
Fast Certified Robust Training with Short Warmup
Fast Certified Robust Training with Short Warmup
Zhouxing Shi
Yihan Wang
Huan Zhang
Jinfeng Yi
Cho-Jui Hsieh
AAML
15
52
0
31 Mar 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
Spending Your Winning Lottery Better After Drawing It
Spending Your Winning Lottery Better After Drawing It
Ajay Jaiswal
Haoyu Ma
Tianlong Chen
Ying Ding
Zhangyang Wang
15
6
0
08 Jan 2021
1