ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.00307
  4. Cited By
Loss landscapes and optimization in over-parameterized non-linear
  systems and neural networks
v1v2 (latest)

Loss landscapes and optimization in over-parameterized non-linear systems and neural networks

29 February 2020
Chaoyue Liu
Libin Zhu
M. Belkin
    ODL
ArXiv (abs)PDFHTML

Papers citing "Loss landscapes and optimization in over-parameterized non-linear systems and neural networks"

50 / 168 papers shown
Title
Flat Channels to Infinity in Neural Loss Landscapes
Flat Channels to Infinity in Neural Loss Landscapes
Flavio Martinelli
Alexander Van Meegen
Berfin Simsek
W. Gerstner
Johanni Brea
15
0
0
17 Jun 2025
Glocal Smoothness: Line Search can really help!
Glocal Smoothness: Line Search can really help!
Curtis Fox
Aaron Mishkin
Sharan Vaswani
Mark Schmidt
44
2
0
14 Jun 2025
Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings
Evan Markou
Thalaiyasingam Ajanthan
Stephen Gould
31
0
0
10 Jun 2025
Federated Instrumental Variable Analysis via Federated Generalized Method of Moments
Federated Instrumental Variable Analysis via Federated Generalized Method of Moments
Geetika
Somya Tyagi
Bapi Chatterjee
FedML
34
0
0
27 May 2025
A Theoretical Framework for Grokking: Interpolation followed by Riemannian Norm Minimisation
A Theoretical Framework for Grokking: Interpolation followed by Riemannian Norm Minimisation
Etienne Boursier
Scott Pesme
Radu-Alexandru Dragomir
33
0
0
26 May 2025
POQD: Performance-Oriented Query Decomposer for Multi-vector retrieval
POQD: Performance-Oriented Query Decomposer for Multi-vector retrieval
Yaoyang Liu
Junlin Li
Yinjun Wu
Zhen Chen
67
0
0
25 May 2025
Sample Complexity of Diffusion Model Training Without Empirical Risk Minimizer Access
Sample Complexity of Diffusion Model Training Without Empirical Risk Minimizer Access
Mudit Gaur
Prashant Trivedi
Sasidhar Kunapuli
Amrit Singh Bedi
Vaneet Aggarwal
37
0
0
23 May 2025
Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives
Huanran Chen
Yinpeng Dong
Zeming Wei
Yao Huang
Yichi Zhang
Hang Su
Jun Zhu
MoMe
94
1
0
23 May 2025
Statistical Inference for Online Algorithms
Statistical Inference for Online Algorithms
Selina Carter
Arun K Kuchibhotla
62
1
0
22 May 2025
Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)
Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)
Artem Riabinin
Egor Shulgin
Kaja Gruntkowska
Peter Richtárik
AI4CE
135
1
0
19 May 2025
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models
Ziqing Xu
Hancheng Min
Salma Tarmoun
Enrique Mallada
Rene Vidal
125
0
0
16 May 2025
Don't be lazy: CompleteP enables compute-efficient deep transformers
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Li
Blake Bordelon
Shane Bergsma
Cengiz Pehlevan
Boris Hanin
Joel Hestness
118
2
0
02 May 2025
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
Xinyue Zeng
Haohui Wang
Junhong Lin
Jun Wu
Tyler Cody
Dawei Zhou
433
0
0
01 May 2025
Client Selection in Federated Learning with Data Heterogeneity and Network Latencies
Client Selection in Federated Learning with Data Heterogeneity and Network Latencies
Harsh Vardhan
Xiaofan Yu
Tajana Rosing
A. Mazumdar
FedML
70
0
0
02 Apr 2025
Analysis of an Idealized Stochastic Polyak Method and its Application to Black-Box Model Distillation
Analysis of an Idealized Stochastic Polyak Method and its Application to Black-Box Model Distillation
Robert M. Gower
Guillaume Garrigos
Nicolas Loizou
Dimitris Oikonomou
Konstantin Mishchenko
Fabian Schaipp
83
1
0
02 Apr 2025
Analyzing the Role of Permutation Invariance in Linear Mode Connectivity
Keyao Zhan
Puheng Li
Lei Wu
MoMe
113
0
0
13 Mar 2025
MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment
MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment
Tianze Wang
Dongnan Gui
Yifan Hu
Shuhang Lin
Linjun Zhang
91
1
0
25 Feb 2025
Convergence of Shallow ReLU Networks on Weakly Interacting Data
Convergence of Shallow ReLU Networks on Weakly Interacting Data
Léo Dana
Francis R. Bach
Loucas Pillaud-Vivien
MLT
95
2
0
24 Feb 2025
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment
Faster WIND: Accelerating Iterative Best-of-NNN Distillation for LLM Alignment
Tong Yang
Jincheng Mei
H. Dai
Zixin Wen
Shicong Cen
Dale Schuurmans
Yuejie Chi
Bo Dai
120
4
0
20 Feb 2025
A Novel Unified Parametric Assumption for Nonconvex Optimization
A Novel Unified Parametric Assumption for Nonconvex Optimization
Artem Riabinin
Ahmed Khaled
Peter Richtárik
78
0
0
17 Feb 2025
Curse of Dimensionality in Neural Network Optimization
Curse of Dimensionality in Neural Network Optimization
Sanghoon Na
Haizhao Yang
89
0
0
07 Feb 2025
Coreset-Based Task Selection for Sample-Efficient Meta-Reinforcement Learning
Coreset-Based Task Selection for Sample-Efficient Meta-Reinforcement Learning
Donglin Zhan
Leonardo F. Toso
James Anderson
221
3
0
04 Feb 2025
On Penalty-based Bilevel Gradient Descent Method
On Penalty-based Bilevel Gradient Descent Method
Han Shen
Quan-Wu Xiao
Tianyi Chen
134
59
0
08 Jan 2025
How to explain grokking
How to explain grokking
S. V. Kozyrev
AI4CE
100
0
0
17 Dec 2024
Just a Simple Transformation is Enough for Data Protection in Vertical
  Federated Learning
Just a Simple Transformation is Enough for Data Protection in Vertical Federated Learning
Andrei Semenov
Philip Zmushko
Alexander Pichugin
Aleksandr Beznosikov
133
0
0
16 Dec 2024
Effectively Leveraging Momentum Terms in Stochastic Line Search
  Frameworks for Fast Optimization of Finite-Sum Problems
Effectively Leveraging Momentum Terms in Stochastic Line Search Frameworks for Fast Optimization of Finite-Sum Problems
Matteo Lapucci
Davide Pucci
ODL
59
0
0
11 Nov 2024
Pipeline Gradient-based Model Training on Analog In-memory Accelerators
Pipeline Gradient-based Model Training on Analog In-memory Accelerators
Zhaoxian Wu
Quan-Wu Xiao
Tayfun Gokmen
H. Tsai
Kaoutar El Maghraoui
Tianyi Chen
69
1
0
19 Oct 2024
Loss Landscape Characterization of Neural Networks without
  Over-Parametrization
Loss Landscape Characterization of Neural Networks without Over-Parametrization
Rustem Islamov
Niccolò Ajroldi
Antonio Orvieto
Aurelien Lucchi
80
4
0
16 Oct 2024
Sharper Guarantees for Learning Neural Network Classifiers with Gradient
  Methods
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
Hossein Taheri
Christos Thrampoulidis
Arya Mazumdar
MLT
121
0
0
13 Oct 2024
Deep Transfer Learning: Model Framework and Error Analysis
Deep Transfer Learning: Model Framework and Error Analysis
Yuling Jiao
Huazhen Lin
Yuchen Luo
Jerry Zhijian Yang
131
1
0
12 Oct 2024
Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions
Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions
Siqiao Mu
Diego Klabjan
MU
136
5
0
15 Sep 2024
Convergence Conditions for Stochastic Line Search Based Optimization of Over-parametrized Models
Convergence Conditions for Stochastic Line Search Based Optimization of Over-parametrized Models
Matteo Lapucci
Davide Pucci
94
1
0
06 Aug 2024
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
Yupeng Chen
Senmiao Wang
Zhihang Lin
Zhihang Lin
Yushun Zhang
Tian Ding
Ruoyu Sun
Ruoyu Sun
CLL
177
5
0
30 Jul 2024
Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum
Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum
H. Cai
Sulaiman A. Alghunaim
Ali H.Sayed
115
1
0
18 Jun 2024
Minimizing Energy Costs in Deep Learning Model Training: The Gaussian
  Sampling Approach
Minimizing Energy Costs in Deep Learning Model Training: The Gaussian Sampling Approach
Challapalli Phanindra Revanth
Sumohana S. Channappayya
C Krishna Mohan
127
23
0
11 Jun 2024
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees
A. Banerjee
Qiaobo Li
Yingxue Zhou
162
0
0
11 Jun 2024
Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization
  by Large Step Sizes
Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes
Dan Qiao
Kaiqi Zhang
Esha Singh
Daniel Soudry
Yu-Xiang Wang
NoLa
89
4
0
10 Jun 2024
Polynomial-Augmented Neural Networks (PANNs) with Weak Orthogonality Constraints for Enhanced Function and PDE Approximation
Polynomial-Augmented Neural Networks (PANNs) with Weak Orthogonality Constraints for Enhanced Function and PDE Approximation
Madison Cooley
Shandian Zhe
Robert M. Kirby
Varun Shankar
204
1
0
04 Jun 2024
Manifold Metric: A Loss Landscape Approach for Predicting Model Performance
Manifold Metric: A Loss Landscape Approach for Predicting Model Performance
Pranshu Malviya
Jerry Huang
A. Baratin
Quentin Fournier
Sarath Chandar
79
0
0
24 May 2024
Bounds for the smallest eigenvalue of the NTK for arbitrary spherical
  data of arbitrary dimension
Bounds for the smallest eigenvalue of the NTK for arbitrary spherical data of arbitrary dimension
Kedar Karhadkar
Michael Murray
Guido Montúfar
105
3
0
23 May 2024
Almost sure convergence rates of stochastic gradient methods under gradient domination
Almost sure convergence rates of stochastic gradient methods under gradient domination
Simon Weissmann
Sara Klein
Waïss Azizian
Leif Döring
93
3
0
22 May 2024
Error Analysis of Three-Layer Neural Network Trained with PGD for Deep
  Ritz Method
Error Analysis of Three-Layer Neural Network Trained with PGD for Deep Ritz Method
Yuling Jiao
Yanming Lai
Yang Wang
AI4CE
45
1
0
19 May 2024
Minimisation of Polyak-Łojasewicz Functions Using Random Zeroth-Order
  Oracles
Minimisation of Polyak-Łojasewicz Functions Using Random Zeroth-Order Oracles
Amir Ali Farzin
Iman Shames
53
1
0
15 May 2024
Train Faster, Perform Better: Modular Adaptive Training in
  Over-Parameterized Models
Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models
Yubin Shi
Yixuan Chen
Mingzhi Dong
Xiaochen Yang
Dongsheng Li
...
Yingying Zhao
Fan Yang
Tun Lu
Ning Gu
L. Shang
MoMe
81
4
0
13 May 2024
Data-Efficient and Robust Task Selection for Meta-Learning
Data-Efficient and Robust Task Selection for Meta-Learning
Donglin Zhan
James Anderson
OOD
98
2
0
11 May 2024
$ε$-Policy Gradient for Online Pricing
εεε-Policy Gradient for Online Pricing
Lukasz Szpruch
Tanut Treetanthiploet
Yufei Zhang
OffRL
90
1
0
06 May 2024
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Matteo Tucat
Anirbit Mukherjee
Procheta Sen
Mingfei Sun
Omar Rivasplata
MLT
89
1
0
12 Apr 2024
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
Aaron Mishkin
Mert Pilanci
Mark Schmidt
146
1
0
03 Apr 2024
Functional Bilevel Optimization for Machine Learning
Functional Bilevel Optimization for Machine Learning
Ieva Petrulionyte
Julien Mairal
Michael Arbel
101
5
0
29 Mar 2024
The Effectiveness of Local Updates for Decentralized Learning under Data
  Heterogeneity
The Effectiveness of Local Updates for Decentralized Learning under Data Heterogeneity
Tongle Wu
Ying Sun
51
1
0
23 Mar 2024
1234
Next