ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.01672
  4. Cited By
Generalization in Deep Networks: The Role of Distance from
  Initialization

Generalization in Deep Networks: The Role of Distance from Initialization

7 January 2019
Vaishnavh Nagarajan
J. Zico Kolter
    ODL
ArXivPDFHTML

Papers citing "Generalization in Deep Networks: The Role of Distance from Initialization"

50 / 75 papers shown
Title
Just One Layer Norm Guarantees Stable Extrapolation
Just One Layer Norm Guarantees Stable Extrapolation
Juliusz Ziomek
George Whittle
Michael A. Osborne
9
0
0
20 May 2025
A Near Complete Nonasymptotic Generalization Theory For Multilayer Neural Networks: Beyond the Bias-Variance Tradeoff
Hao Yu
Xiangyang Ji
AI4CE
60
0
0
03 Mar 2025
Is network fragmentation a useful complexity measure?
Is network fragmentation a useful complexity measure?
Coenraad Mouton
Randle Rabe
Daniël G. Haasbroek
Marthinus W. Theunissen
Hermanus L. Potgieter
Marelie Hattingh Davel
209
0
0
07 Nov 2024
What Does Softmax Probability Tell Us about Classifiers Ranking Across
  Diverse Test Conditions?
What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions?
Weijie Tu
Weijian Deng
Liang Zheng
Tom Gedeon
44
0
0
14 Jun 2024
How many samples are needed to train a deep neural network?
How many samples are needed to train a deep neural network?
Pegah Golestaneh
Mahsa Taheri
Johannes Lederer
34
4
0
26 May 2024
Generalization Measures for Zero-Shot Cross-Lingual Transfer
Generalization Measures for Zero-Shot Cross-Lingual Transfer
Saksham Bassi
Duygu Ataman
Kyunghyun Cho
29
0
0
24 Apr 2024
LNPT: Label-free Network Pruning and Training
LNPT: Label-free Network Pruning and Training
Jinying Xiao
Ping Li
Zhe Tang
Jie Nie
38
2
0
19 Mar 2024
On the Diminishing Returns of Width for Continual Learning
On the Diminishing Returns of Width for Continual Learning
E. Guha
V. Lakshman
CLL
39
4
0
11 Mar 2024
Stable Distillation: Regularizing Continued Pre-training for
  Low-Resource Automatic Speech Recognition
Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition
Ashish Seth
Sreyan Ghosh
S. Umesh
Dinesh Manocha
CLL
40
1
0
20 Dec 2023
The Pursuit of Human Labeling: A New Perspective on Unsupervised
  Learning
The Pursuit of Human Labeling: A New Perspective on Unsupervised Learning
Artyom Gadetsky
Maria Brbić
22
6
0
06 Nov 2023
Understanding prompt engineering may not require rethinking
  generalization
Understanding prompt engineering may not require rethinking generalization
Victor Akinwande
Yiding Jiang
Dylan Sam
J. Zico Kolter
VLM
VPVLM
123
7
0
06 Oct 2023
Fantastic Generalization Measures are Nowhere to be Found
Fantastic Generalization Measures are Nowhere to be Found
Michael C. Gastpar
Ido Nachum
Jonathan Shafer
T. Weinberger
34
13
0
24 Sep 2023
Zero-Shot Neural Architecture Search: Challenges, Solutions, and
  Opportunities
Zero-Shot Neural Architecture Search: Challenges, Solutions, and Opportunities
Guihong Li
Duc-Tuong Hoang
Kartikeya Bhardwaj
Ming Lin
Zhangyang Wang
R. Marculescu
46
11
0
05 Jul 2023
Cramming: Training a Language Model on a Single GPU in One Day
Cramming: Training a Language Model on a Single GPU in One Day
Jonas Geiping
Tom Goldstein
MoE
32
86
0
28 Dec 2022
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Andrei Atanov
Andrei Filatov
Teresa Yeo
Ajay Sohmshetty
Amir Zamir
OOD
50
10
0
01 Dec 2022
Accelerated Riemannian Optimization: Handling Constraints with a Prox to
  Bound Geometric Penalties
Accelerated Riemannian Optimization: Handling Constraints with a Prox to Bound Geometric Penalties
David Martínez-Rubio
Sebastian Pokutta
20
9
0
26 Nov 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of
  SGD via Training Trajectories and via Terminal States
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
Ziqiao Wang
Yongyi Mao
30
10
0
19 Nov 2022
On the Sample Complexity of Two-Layer Networks: Lipschitz vs.
  Element-Wise Lipschitz Activation
On the Sample Complexity of Two-Layer Networks: Lipschitz vs. Element-Wise Lipschitz Activation
Amit Daniely
Elad Granot
MLT
32
1
0
17 Nov 2022
Self-Distillation for Further Pre-training of Transformers
Self-Distillation for Further Pre-training of Transformers
Seanie Lee
Minki Kang
Juho Lee
Sung Ju Hwang
Kenji Kawaguchi
49
8
0
30 Sep 2022
Why neural networks find simple solutions: the many regularizers of
  geometric complexity
Why neural networks find simple solutions: the many regularizers of geometric complexity
Benoit Dherin
Michael Munn
M. Rosca
David Barrett
57
31
0
27 Sep 2022
Intersection of Parallels as an Early Stopping Criterion
Intersection of Parallels as an Early Stopping Criterion
Ali Vardasbi
Maarten de Rijke
Mostafa Dehghani
MoMe
41
5
0
19 Aug 2022
Predicting Out-of-Domain Generalization with Neighborhood Invariance
Predicting Out-of-Domain Generalization with Neighborhood Invariance
Nathan Ng
Neha Hulkund
Kyunghyun Cho
Marzyeh Ghassemi
OOD
27
4
0
05 Jul 2022
Sparse Double Descent: Where Network Pruning Aggravates Overfitting
Sparse Double Descent: Where Network Pruning Aggravates Overfitting
Zhengqi He
Zeke Xie
Quanzhi Zhu
Zengchang Qin
83
27
0
17 Jun 2022
Robust Fine-Tuning of Deep Neural Networks with Hessian-based
  Generalization Guarantees
Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees
Haotian Ju
Dongyue Li
Hongyang R. Zhang
45
28
0
06 Jun 2022
Generalization Bounds for Gradient Methods via Discrete and Continuous
  Prior
Generalization Bounds for Gradient Methods via Discrete and Continuous Prior
Jun Yu Li
Xu Luo
Jian Li
27
4
0
27 May 2022
Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective
Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective
Keitaro Sakamoto
Issei Sato
41
9
0
15 May 2022
Investigating Generalization by Controlling Normalized Margin
Investigating Generalization by Controlling Normalized Margin
Alexander R. Farhang
Jeremy Bernstein
Kushal Tirumala
Yang Liu
Yisong Yue
33
6
0
08 May 2022
Predicting the generalization gap in neural networks using topological
  data analysis
Predicting the generalization gap in neural networks using topological data analysis
Rubén Ballester
Xavier Arnal Clemente
Carles Casacuberta
Meysam Madadi
C. Corneanu
Sergio Escalera
41
3
0
23 Mar 2022
The activity-weight duality in feed forward neural networks: The
  geometric determinants of generalization
The activity-weight duality in feed forward neural networks: The geometric determinants of generalization
Yu Feng
Yuhai Tu
MLT
85
14
0
21 Mar 2022
On Measuring Excess Capacity in Neural Networks
On Measuring Excess Capacity in Neural Networks
Florian Graf
Sebastian Zeng
Bastian Alexander Rieck
Marc Niethammer
Roland Kwitt
27
10
0
16 Feb 2022
Weight Expansion: A New Perspective on Dropout and Generalization
Weight Expansion: A New Perspective on Dropout and Generalization
Gao Jin
Xinping Yi
Pengfei Yang
Lijun Zhang
S. Schewe
Xiaowei Huang
29
5
0
23 Jan 2022
In Search of Probeable Generalization Measures
In Search of Probeable Generalization Measures
Jonathan Jaegerman
Khalil Damouni
M. M. Ankaralı
Konstantinos N. Plataniotis
27
2
0
23 Oct 2021
Cascaded Compressed Sensing Networks: A Reversible Architecture for
  Layerwise Learning
Cascaded Compressed Sensing Networks: A Reversible Architecture for Layerwise Learning
Weizhi Lu
Mingrui Chen
Kai Guo
Weiyu Li
21
0
0
20 Oct 2021
Exploring the Common Principal Subspace of Deep Features in Neural
  Networks
Exploring the Common Principal Subspace of Deep Features in Neural Networks
Haoran Liu
Haoyi Xiong
Yaqing Wang
Haozhe An
Dongrui Wu
Dejing Dou
13
1
0
06 Oct 2021
AdjointNet: Constraining machine learning models with physics-based
  codes
AdjointNet: Constraining machine learning models with physics-based codes
S. Karra
B. Ahmmed
M. Mudunuru
AI4CE
PINN
OOD
24
4
0
08 Sep 2021
Training Algorithm Matters for the Performance of Neural Network
  Potential: A Case Study of Adam and the Kalman Filter Optimizers
Training Algorithm Matters for the Performance of Neural Network Potential: A Case Study of Adam and the Kalman Filter Optimizers
Yunqi Shao
Florian M. Dietrich
Carl Nettelblad
Chao Zhang
14
10
0
08 Sep 2021
Learning an Explicit Hyperparameter Prediction Function Conditioned on
  Tasks
Learning an Explicit Hyperparameter Prediction Function Conditioned on Tasks
Jun Shu
Deyu Meng
Zongben Xu
37
6
0
06 Jul 2021
A Theoretical Analysis of Fine-tuning with Linear Teachers
A Theoretical Analysis of Fine-tuning with Linear Teachers
Gal Shachaf
Alon Brutzkus
Amir Globerson
34
17
0
04 Jul 2021
Assessing Generalization of SGD via Disagreement
Assessing Generalization of SGD via Disagreement
Yiding Jiang
Vaishnavh Nagarajan
Christina Baek
J. Zico Kolter
67
109
0
25 Jun 2021
Practical Assessment of Generalization Performance Robustness for Deep
  Networks via Contrastive Examples
Practical Assessment of Generalization Performance Robustness for Deep Networks via Contrastive Examples
Xuanyu Wu
Xuhong Li
Haoyi Xiong
Xiao Zhang
Siyu Huang
Dejing Dou
21
1
0
20 Jun 2021
Measuring Generalization with Optimal Transport
Measuring Generalization with Optimal Transport
Ching-Yao Chuang
Youssef Mroueh
Kristjan Greenewald
Antonio Torralba
Stefanie Jegelka
OT
46
25
0
07 Jun 2021
Robustness to Pruning Predicts Generalization in Deep Neural Networks
Robustness to Pruning Predicts Generalization in Deep Neural Networks
Lorenz Kuhn
Clare Lyle
Aidan Gomez
Jonas Rothfuss
Y. Gal
43
14
0
10 Mar 2021
Cockpit: A Practical Debugging Tool for the Training of Deep Neural
  Networks
Cockpit: A Practical Debugging Tool for the Training of Deep Neural Networks
Frank Schneider
Felix Dangel
Philipp Hennig
45
10
0
12 Feb 2021
Physics-informed neural networks with hard constraints for inverse
  design
Physics-informed neural networks with hard constraints for inverse design
Lu Lu
R. Pestourie
Wenjie Yao
Zhicheng Wang
F. Verdugo
Steven G. Johnson
PINN
50
495
0
09 Feb 2021
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient
  Descent
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent
Kangqiao Liu
Liu Ziyin
Masakuni Ueda
MLT
61
37
0
07 Dec 2020
Global Riemannian Acceleration in Hyperbolic and Spherical Spaces
Global Riemannian Acceleration in Hyperbolic and Spherical Spaces
David Martínez-Rubio
40
19
0
07 Dec 2020
Understanding the Failure Modes of Out-of-Distribution Generalization
Understanding the Failure Modes of Out-of-Distribution Generalization
Vaishnavh Nagarajan
Anders Andreassen
Behnam Neyshabur
OOD
OODD
19
176
0
29 Oct 2020
How does Weight Correlation Affect the Generalisation Ability of Deep
  Neural Networks
How does Weight Correlation Affect the Generalisation Ability of Deep Neural Networks
Gao Jin
Xinping Yi
Liang Zhang
Lijun Zhang
S. Schewe
Xiaowei Huang
11
40
0
12 Oct 2020
Understanding the Role of Adversarial Regularization in Supervised
  Learning
Understanding the Role of Adversarial Regularization in Supervised Learning
Litu Rout
18
3
0
01 Oct 2020
Why Adversarial Interaction Creates Non-Homogeneous Patterns: A
  Pseudo-Reaction-Diffusion Model for Turing Instability
Why Adversarial Interaction Creates Non-Homogeneous Patterns: A Pseudo-Reaction-Diffusion Model for Turing Instability
Litu Rout
AAML
11
1
0
01 Oct 2020
12
Next