Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2003.00307
Cited By
Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
29 February 2020
Chaoyue Liu
Libin Zhu
M. Belkin
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Loss landscapes and optimization in over-parameterized non-linear systems and neural networks"
50 / 64 papers shown
Title
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Li
Blake Bordelon
Shane Bergsma
Cengiz Pehlevan
Boris Hanin
Joel Hestness
44
0
0
02 May 2025
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
Xinyue Zeng
Haohui Wang
Junhong Lin
Jun Wu
Tyler Cody
Dawei Zhou
121
0
0
01 May 2025
Analyzing the Role of Permutation Invariance in Linear Mode Connectivity
Keyao Zhan
Puheng Li
Lei Wu
MoMe
82
0
0
13 Mar 2025
Faster WIND: Accelerating Iterative Best-of-
N
N
N
Distillation for LLM Alignment
Tong Yang
Jincheng Mei
H. Dai
Zixin Wen
Shicong Cen
Dale Schuurmans
Yuejie Chi
Bo Dai
45
4
0
20 Feb 2025
Feature Learning Beyond the Edge of Stability
Dávid Terjék
MLT
46
0
0
18 Feb 2025
Coreset-Based Task Selection for Sample-Efficient Meta-Reinforcement Learning
Donglin Zhan
Leonardo F. Toso
James Anderson
101
1
0
04 Feb 2025
How to explain grokking
S. V. Kozyrev
AI4CE
36
0
0
03 Jan 2025
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
Hossein Taheri
Christos Thrampoulidis
Arya Mazumdar
MLT
36
0
0
13 Oct 2024
Deep Transfer Learning: Model Framework and Error Analysis
Yuling Jiao
Huazhen Lin
Yuchen Luo
Jerry Zhijian Yang
44
1
0
12 Oct 2024
Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions
Siqiao Mu
Diego Klabjan
MU
50
3
0
15 Sep 2024
Convergence Conditions for Stochastic Line Search Based Optimization of Over-parametrized Models
Matteo Lapucci
Davide Pucci
37
1
0
06 Aug 2024
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
Yupeng Chen
Senmiao Wang
Zhihang Lin
Zhihang Lin
Yushun Zhang
Tian Ding
Ruoyu Sun
Ruoyu Sun
CLL
80
1
0
30 Jul 2024
Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum
H. Cai
Sulaiman A. Alghunaim
Ali H.Sayed
48
1
0
18 Jun 2024
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees
A. Banerjee
Qiaobo Li
Yingxue Zhou
52
0
0
11 Jun 2024
Polynomial-Augmented Neural Networks (PANNs) with Weak Orthogonality Constraints for Enhanced Function and PDE Approximation
Madison Cooley
Shandian Zhe
Robert M. Kirby
Varun Shankar
62
1
0
04 Jun 2024
Almost sure convergence rates of stochastic gradient methods under gradient domination
Simon Weissmann
Sara Klein
Waïss Azizian
Leif Döring
39
3
0
22 May 2024
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Matteo Tucat
Anirbit Mukherjee
Procheta Sen
Mingfei Sun
Omar Rivasplata
MLT
39
1
0
12 Apr 2024
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
Aaron Mishkin
Mert Pilanci
Mark Schmidt
64
1
0
03 Apr 2024
Merging Text Transformer Models from Different Initializations
Neha Verma
Maha Elbayad
MoMe
59
7
0
01 Mar 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
37
13
0
08 Feb 2024
Careful with that Scalpel: Improving Gradient Surgery with an EMA
Yu-Guan Hsieh
James Thornton
Eugène Ndiaye
Michal Klein
Marco Cuturi
Pierre Ablin
MedIm
39
0
0
05 Feb 2024
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
44
1
0
29 Nov 2023
Soft Random Sampling: A Theoretical and Empirical Analysis
Xiaodong Cui
Ashish R. Mittal
Songtao Lu
Wei Zhang
G. Saon
Brian Kingsbury
48
1
0
21 Nov 2023
Modify Training Directions in Function Space to Reduce Generalization Error
Yi Yu
Wenlian Lu
Boyu Chen
27
0
0
25 Jul 2023
ADLER -- An efficient Hessian-based strategy for adaptive learning rate
Dario Balboni
D. Bacciu
ODL
24
1
0
25 May 2023
Fast Convergence in Learning Two-Layer Neural Networks with Separable Data
Hossein Taheri
Christos Thrampoulidis
MLT
16
3
0
22 May 2023
Improving Convergence and Generalization Using Parameter Symmetries
Bo Zhao
Robert Mansel Gower
Robin Walters
Rose Yu
MoMe
33
13
0
22 May 2023
Depth Dependence of
μ
μ
μ
P Learning Rates in ReLU MLPs
Samy Jelassi
Boris Hanin
Ziwei Ji
Sashank J. Reddi
Srinadh Bhojanapalli
Surinder Kumar
27
6
0
13 May 2023
Automatic Gradient Descent: Deep Learning without Hyperparameters
Jeremy Bernstein
Chris Mingard
Kevin Huang
Navid Azizan
Yisong Yue
ODL
16
17
0
11 Apr 2023
Rethinking Model Ensemble in Transfer-based Adversarial Attacks
Huanran Chen
Yichi Zhang
Yinpeng Dong
Xiao Yang
Hang Su
Junyi Zhu
AAML
28
56
0
16 Mar 2023
Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss
Pierre Bréchet
Katerina Papagiannouli
Jing An
Guido Montúfar
33
3
0
06 Mar 2023
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
36
101
0
27 Feb 2023
On the Convergence of Federated Averaging with Cyclic Client Participation
Yae Jee Cho
Pranay Sharma
Gauri Joshi
Zheng Xu
Satyen Kale
Tong Zhang
FedML
44
27
0
06 Feb 2023
On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality
Lu Xia
M. Hochstenbach
Stefano Massei
27
2
0
23 Jan 2023
Convergence beyond the over-parameterized regime using Rayleigh quotients
David A. R. Robin
Kevin Scaman
Marc Lelarge
27
3
0
19 Jan 2023
Generalized Gradient Flows with Provable Fixed-Time Convergence and Fast Evasion of Non-Degenerate Saddle Points
Mayank Baranwal
Param Budhraja
V. Raj
A. Hota
33
2
0
07 Dec 2022
Zeroth-Order Alternating Gradient Descent Ascent Algorithms for a Class of Nonconvex-Nonconcave Minimax Problems
Zi Xu
Ziqi Wang
Junlin Wang
Y. Dai
21
11
0
24 Nov 2022
REPAIR: REnormalizing Permuted Activations for Interpolation Repair
Keller Jordan
Hanie Sedghi
O. Saukh
R. Entezari
Behnam Neyshabur
MoMe
46
94
0
15 Nov 2022
Optimization for Amortized Inverse Problems
Tianci Liu
Tong Yang
Quan Zhang
Qi Lei
36
5
0
25 Oct 2022
On skip connections and normalisation layers in deep optimisation
L. MacDonald
Jack Valmadre
Hemanth Saratchandran
Simon Lucey
ODL
19
1
0
10 Oct 2022
Exploring the Algorithm-Dependent Generalization of AUPRC Optimization with List Stability
Peisong Wen
Qianqian Xu
Zhiyong Yang
Yuan He
Qingming Huang
53
10
0
27 Sep 2022
Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold
Can Yaras
Peng Wang
Zhihui Zhu
Laura Balzano
Qing Qu
25
42
0
19 Sep 2022
Asymptotic Statistical Analysis of
f
f
f
-divergence GAN
Xinwei Shen
Kani Chen
Tong Zhang
21
2
0
14 Sep 2022
Optimizing the Performative Risk under Weak Convexity Assumptions
Yulai Zhao
30
5
0
02 Sep 2022
On the generalization of learning algorithms that do not converge
N. Chandramoorthy
Andreas Loukas
Khashayar Gatmiry
Stefanie Jegelka
MLT
19
11
0
16 Aug 2022
Momentum Diminishes the Effect of Spectral Bias in Physics-Informed Neural Networks
G. Farhani
Alexander Kazachek
Boyu Wang
24
6
0
29 Jun 2022
Provable Acceleration of Heavy Ball beyond Quadratics for a Class of Polyak-Łojasiewicz Functions when the Non-Convexity is Averaged-Out
Jun-Kun Wang
Chi-Heng Lin
Andre Wibisono
Bin Hu
32
20
0
22 Jun 2022
Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs
Etienne Boursier
Loucas Pillaud-Vivien
Nicolas Flammarion
ODL
24
58
0
02 Jun 2022
Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture
Libin Zhu
Chaoyue Liu
M. Belkin
GNN
AI4CE
23
4
0
24 May 2022
Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD
Konstantinos E. Nikolakakis
Farzin Haddadpour
Amin Karbasi
Dionysios S. Kalogerias
43
17
0
26 Apr 2022
1
2
Next