Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2003.00307
Cited By
v1
v2 (latest)
Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
29 February 2020
Chaoyue Liu
Libin Zhu
M. Belkin
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Loss landscapes and optimization in over-parameterized non-linear systems and neural networks"
50 / 168 papers shown
Title
Flat Channels to Infinity in Neural Loss Landscapes
Flavio Martinelli
Alexander Van Meegen
Berfin Simsek
W. Gerstner
Johanni Brea
15
0
0
17 Jun 2025
Glocal Smoothness: Line Search can really help!
Curtis Fox
Aaron Mishkin
Sharan Vaswani
Mark Schmidt
44
2
0
14 Jun 2025
Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings
Evan Markou
Thalaiyasingam Ajanthan
Stephen Gould
31
0
0
10 Jun 2025
Federated Instrumental Variable Analysis via Federated Generalized Method of Moments
Geetika
Somya Tyagi
Bapi Chatterjee
FedML
34
0
0
27 May 2025
A Theoretical Framework for Grokking: Interpolation followed by Riemannian Norm Minimisation
Etienne Boursier
Scott Pesme
Radu-Alexandru Dragomir
33
0
0
26 May 2025
POQD: Performance-Oriented Query Decomposer for Multi-vector retrieval
Yaoyang Liu
Junlin Li
Yinjun Wu
Zhen Chen
67
0
0
25 May 2025
Sample Complexity of Diffusion Model Training Without Empirical Risk Minimizer Access
Mudit Gaur
Prashant Trivedi
Sasidhar Kunapuli
Amrit Singh Bedi
Vaneet Aggarwal
37
0
0
23 May 2025
Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives
Huanran Chen
Yinpeng Dong
Zeming Wei
Yao Huang
Yichi Zhang
Hang Su
Jun Zhu
MoMe
94
1
0
23 May 2025
Statistical Inference for Online Algorithms
Selina Carter
Arun K Kuchibhotla
62
1
0
22 May 2025
Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)
Artem Riabinin
Egor Shulgin
Kaja Gruntkowska
Peter Richtárik
AI4CE
135
1
0
19 May 2025
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models
Ziqing Xu
Hancheng Min
Salma Tarmoun
Enrique Mallada
Rene Vidal
125
0
0
16 May 2025
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Li
Blake Bordelon
Shane Bergsma
Cengiz Pehlevan
Boris Hanin
Joel Hestness
118
2
0
02 May 2025
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
Xinyue Zeng
Haohui Wang
Junhong Lin
Jun Wu
Tyler Cody
Dawei Zhou
433
0
0
01 May 2025
Client Selection in Federated Learning with Data Heterogeneity and Network Latencies
Harsh Vardhan
Xiaofan Yu
Tajana Rosing
A. Mazumdar
FedML
70
0
0
02 Apr 2025
Analysis of an Idealized Stochastic Polyak Method and its Application to Black-Box Model Distillation
Robert M. Gower
Guillaume Garrigos
Nicolas Loizou
Dimitris Oikonomou
Konstantin Mishchenko
Fabian Schaipp
83
1
0
02 Apr 2025
Analyzing the Role of Permutation Invariance in Linear Mode Connectivity
Keyao Zhan
Puheng Li
Lei Wu
MoMe
113
0
0
13 Mar 2025
MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment
Tianze Wang
Dongnan Gui
Yifan Hu
Shuhang Lin
Linjun Zhang
91
1
0
25 Feb 2025
Convergence of Shallow ReLU Networks on Weakly Interacting Data
Léo Dana
Francis R. Bach
Loucas Pillaud-Vivien
MLT
95
2
0
24 Feb 2025
Faster WIND: Accelerating Iterative Best-of-
N
N
N
Distillation for LLM Alignment
Tong Yang
Jincheng Mei
H. Dai
Zixin Wen
Shicong Cen
Dale Schuurmans
Yuejie Chi
Bo Dai
120
4
0
20 Feb 2025
A Novel Unified Parametric Assumption for Nonconvex Optimization
Artem Riabinin
Ahmed Khaled
Peter Richtárik
78
0
0
17 Feb 2025
Curse of Dimensionality in Neural Network Optimization
Sanghoon Na
Haizhao Yang
89
0
0
07 Feb 2025
Coreset-Based Task Selection for Sample-Efficient Meta-Reinforcement Learning
Donglin Zhan
Leonardo F. Toso
James Anderson
221
3
0
04 Feb 2025
On Penalty-based Bilevel Gradient Descent Method
Han Shen
Quan-Wu Xiao
Tianyi Chen
134
59
0
08 Jan 2025
How to explain grokking
S. V. Kozyrev
AI4CE
100
0
0
17 Dec 2024
Just a Simple Transformation is Enough for Data Protection in Vertical Federated Learning
Andrei Semenov
Philip Zmushko
Alexander Pichugin
Aleksandr Beznosikov
133
0
0
16 Dec 2024
Effectively Leveraging Momentum Terms in Stochastic Line Search Frameworks for Fast Optimization of Finite-Sum Problems
Matteo Lapucci
Davide Pucci
ODL
59
0
0
11 Nov 2024
Pipeline Gradient-based Model Training on Analog In-memory Accelerators
Zhaoxian Wu
Quan-Wu Xiao
Tayfun Gokmen
H. Tsai
Kaoutar El Maghraoui
Tianyi Chen
69
1
0
19 Oct 2024
Loss Landscape Characterization of Neural Networks without Over-Parametrization
Rustem Islamov
Niccolò Ajroldi
Antonio Orvieto
Aurelien Lucchi
80
4
0
16 Oct 2024
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
Hossein Taheri
Christos Thrampoulidis
Arya Mazumdar
MLT
121
0
0
13 Oct 2024
Deep Transfer Learning: Model Framework and Error Analysis
Yuling Jiao
Huazhen Lin
Yuchen Luo
Jerry Zhijian Yang
131
1
0
12 Oct 2024
Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions
Siqiao Mu
Diego Klabjan
MU
136
5
0
15 Sep 2024
Convergence Conditions for Stochastic Line Search Based Optimization of Over-parametrized Models
Matteo Lapucci
Davide Pucci
94
1
0
06 Aug 2024
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
Yupeng Chen
Senmiao Wang
Zhihang Lin
Zhihang Lin
Yushun Zhang
Tian Ding
Ruoyu Sun
Ruoyu Sun
CLL
177
5
0
30 Jul 2024
Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum
H. Cai
Sulaiman A. Alghunaim
Ali H.Sayed
115
1
0
18 Jun 2024
Minimizing Energy Costs in Deep Learning Model Training: The Gaussian Sampling Approach
Challapalli Phanindra Revanth
Sumohana S. Channappayya
C Krishna Mohan
127
23
0
11 Jun 2024
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees
A. Banerjee
Qiaobo Li
Yingxue Zhou
162
0
0
11 Jun 2024
Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes
Dan Qiao
Kaiqi Zhang
Esha Singh
Daniel Soudry
Yu-Xiang Wang
NoLa
89
4
0
10 Jun 2024
Polynomial-Augmented Neural Networks (PANNs) with Weak Orthogonality Constraints for Enhanced Function and PDE Approximation
Madison Cooley
Shandian Zhe
Robert M. Kirby
Varun Shankar
204
1
0
04 Jun 2024
Manifold Metric: A Loss Landscape Approach for Predicting Model Performance
Pranshu Malviya
Jerry Huang
A. Baratin
Quentin Fournier
Sarath Chandar
79
0
0
24 May 2024
Bounds for the smallest eigenvalue of the NTK for arbitrary spherical data of arbitrary dimension
Kedar Karhadkar
Michael Murray
Guido Montúfar
105
3
0
23 May 2024
Almost sure convergence rates of stochastic gradient methods under gradient domination
Simon Weissmann
Sara Klein
Waïss Azizian
Leif Döring
93
3
0
22 May 2024
Error Analysis of Three-Layer Neural Network Trained with PGD for Deep Ritz Method
Yuling Jiao
Yanming Lai
Yang Wang
AI4CE
45
1
0
19 May 2024
Minimisation of Polyak-Łojasewicz Functions Using Random Zeroth-Order Oracles
Amir Ali Farzin
Iman Shames
53
1
0
15 May 2024
Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models
Yubin Shi
Yixuan Chen
Mingzhi Dong
Xiaochen Yang
Dongsheng Li
...
Yingying Zhao
Fan Yang
Tun Lu
Ning Gu
L. Shang
MoMe
81
4
0
13 May 2024
Data-Efficient and Robust Task Selection for Meta-Learning
Donglin Zhan
James Anderson
OOD
98
2
0
11 May 2024
ε
ε
ε
-Policy Gradient for Online Pricing
Lukasz Szpruch
Tanut Treetanthiploet
Yufei Zhang
OffRL
90
1
0
06 May 2024
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Matteo Tucat
Anirbit Mukherjee
Procheta Sen
Mingfei Sun
Omar Rivasplata
MLT
89
1
0
12 Apr 2024
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
Aaron Mishkin
Mert Pilanci
Mark Schmidt
146
1
0
03 Apr 2024
Functional Bilevel Optimization for Machine Learning
Ieva Petrulionyte
Julien Mairal
Michael Arbel
101
5
0
29 Mar 2024
The Effectiveness of Local Updates for Decentralized Learning under Data Heterogeneity
Tongle Wu
Ying Sun
51
1
0
23 Mar 2024
1
2
3
4
Next