Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1606.04838
Cited By
v1
v2
v3 (latest)
Optimization Methods for Large-Scale Machine Learning
15 June 2016
Léon Bottou
Frank E. Curtis
J. Nocedal
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Optimization Methods for Large-Scale Machine Learning"
50 / 866 papers shown
Title
Rethinking LLM Training through Information Geometry and Quantum Metrics
Riccardo Di Sipio
14
0
0
18 Jun 2025
ImprovDML: Improved Trade-off in Private Byzantine-Resilient Distributed Machine Learning
Bing Liu
Chengcheng Zhao
L. Chai
Peng Cheng
Yaonan Wang
FedML
22
0
0
18 Jun 2025
Adjusted Shuffling SARAH: Advancing Complexity Analysis via Dynamic Gradient Weighting
Duc Toan Nguyen
Trang H. Tran
Lam M. Nguyen
22
0
0
14 Jun 2025
Convergence of Momentum-Based Optimization Algorithms with Time-Varying Parameters
Mathukumalli Vidyasagar
73
0
0
13 Jun 2025
Complexity of normalized stochastic first-order methods with momentum under heavy-tailed noise
Chuan He
Zhaosong Lu
Defeng Sun
Zhanwang Deng
30
0
0
12 Jun 2025
NDCG-Consistent Softmax Approximation with Accelerated Convergence
Yuanhao Pu
Defu Lian
Xiaolong Chen
Xu Huang
Jin Chen
Enhong Chen
53
0
0
11 Jun 2025
Neural Tangent Kernel Analysis to Probe Convergence in Physics-informed Neural Solvers: PIKANs vs. PINNs
Salah A Faroughi
Farinaz Mostajeran
15
0
0
09 Jun 2025
A Stable Whitening Optimizer for Efficient Neural Network Training
Kevin Frans
Sergey Levine
Pieter Abbeel
35
0
0
08 Jun 2025
Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner
Runa Eschenhagen
Aaron Defazio
Tsung-Hsien Lee
Richard Turner
Hao-Jun Michael Shi
93
0
0
04 Jun 2025
Deformable registration and generative modelling of aortic anatomies by auto-decoders and neural ODEs
Riccardo Tenderini
Luca Pegolotti
Fanwei Kong
S. Pagani
Francesco Regazzoni
Alison L. Marsden
S. Deparis
MedIm
AI4CE
49
0
0
01 Jun 2025
Stationary MMD Points for Cubature
Zonghao Chen
Toni Karvonen
Heishiro Kanagawa
F. Briol
Chris J. Oates
82
0
0
27 May 2025
Moment Expansions of the Energy Distance
Ian Langmore
17
0
0
27 May 2025
Dynamic Manifold Evolution Theory: Modeling and Stability Analysis of Latent Representations in Large Language Models
Yukun Zhang
Qi Dong
AI4CE
28
0
0
24 May 2025
Implicit Neural Shape Optimization for 3D High-Contrast Electrical Impedance Tomography
Junqing Chen
Haibo Liu
258
0
0
22 May 2025
Never Skip a Batch: Continuous Training of Temporal GNNs via Adaptive Pseudo-Supervision
Alexander Panyshev
Dmitry Vinichenko
Oleg Travkin
Roman Alferov
Alexey Zaytsev
69
0
0
18 May 2025
Dynamic Perturbed Adaptive Method for Infinite Task-Conflicting Time Series
Jiang You
Xiaozhen Wang
Arben Cela
AI4TS
82
0
0
17 May 2025
A stochastic gradient method for trilevel optimization
Tommaso Giovannelli
G. Kent
Luis Nunes Vicente
74
0
0
11 May 2025
Entropy-Guided Sampling of Flat Modes in Discrete Spaces
Pinaki Mohanty
Riddhiman Bhattacharya
Ruqi Zhang
440
0
0
05 May 2025
Online Functional Principal Component Analysis on a Multidimensional Domain
Muye Nanshan
Nan Zhang
Jiguo Cao
43
0
0
04 May 2025
Accelerating Deep Neural Network Training via Distributed Hybrid Order Optimization
Shunxian Gu
Chaoqun You
Bangbang Ren
Lailong Luo
Junxu Xia
Deke Guo
72
0
0
02 May 2025
OptimAI: Optimization from Natural Language Using LLM-Powered AI Agents
Raghav Thind
Youran Sun
Ling Liang
Haizhao Yang
LLMAG
215
0
0
23 Apr 2025
AlphaGrad: Non-Linear Gradient Normalization Optimizer
Soham Sane
ODL
147
0
0
22 Apr 2025
MetaMolGen: A Neural Graph Motif Generation Model for De Novo Molecular Design
Zimo Yan
Jie Zhang
Zheng Xie
Chang-rui Liu
Yang Liu
Yiping Song
115
0
0
22 Apr 2025
Mixed-Precision Conjugate Gradient Solvers with RL-Driven Precision Tuning
Xinye Chen
96
0
0
19 Apr 2025
A Piecewise Lyapunov Analysis of Sub-quadratic SGD: Applications to Robust and Quantile Regression
Yixuan Zhang
Dongyan
Yudong Chen
Qiaomin Xie
60
0
0
11 Apr 2025
Decentralized Federated Domain Generalization with Style Sharing: A Formal Modeling and Convergence Analysis
Shahryar Zehtabi
Dong-Jun Han
Seyyedali Hosseinalipour
Christopher G. Brinton
FedML
AI4CE
136
0
0
08 Apr 2025
Universal Collection of Euclidean Invariants between Pairs of Position-Orientations
Gijs Bellaard
B. Smets
R. Duits
131
0
0
04 Apr 2025
Approximate Agreement Algorithms for Byzantine Collaborative Learning
Tijana Milentijević
Mélanie Cambus
Darya Melnyk
Stefan Schmid
FedML
148
1
0
02 Apr 2025
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
Dahun Shin
Dongyeop Lee
Jinseok Chung
Namhoon Lee
ODL
AAML
517
0
0
25 Feb 2025
Convergence of Shallow ReLU Networks on Weakly Interacting Data
Léo Dana
Francis R. Bach
Loucas Pillaud-Vivien
MLT
95
2
0
24 Feb 2025
Verification and Validation for Trustworthy Scientific Machine Learning
John D. Jakeman
Lorena A. Barba
J. Martins
Thomas O'Leary-Roseberry
AI4CE
138
1
0
21 Feb 2025
Analog In-memory Training on General Non-ideal Resistive Elements: The Impact of Response Functions
Zhaoxian Wu
Quan Xian
Tayfun Gokmen
Omobayode Fagbohungbe
Tianyi Chen
172
0
0
17 Feb 2025
Preconditioned Inexact Stochastic ADMM for Deep Model
Shenglong Zhou
Ouya Wang
Ziyan Luo
Yongxu Zhu
Geoffrey Ye Li
88
0
0
15 Feb 2025
PBM-VFL: Vertical Federated Learning with Feature and Sample Privacy
Linh Tran
Timothy Castiglia
Stacy Patterson
Ana Milanova
FedML
109
0
0
23 Jan 2025
Celo: Training Versatile Learned Optimizers on a Compute Diet
A. Moudgil
Boris Knyazev
Guillaume Lajoie
Eugene Belilovsky
446
0
0
22 Jan 2025
Revisiting LocalSGD and SCAFFOLD: Improved Rates and Missing Analysis
Ruichen Luo
Sebastian U Stich
Samuel Horváth
Martin Takáč
142
0
0
08 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
466
0
0
30 Dec 2024
Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for large-scale optimization
Corrado Coppola
Lorenzo Papa
Irene Amerini
L. Palagi
ODL
127
0
0
24 Nov 2024
Hierarchical mixtures of Unigram models for short text clustering: The role of Beta-Liouville priors
Massimo Bilancia
Samuele Magro
101
0
0
29 Oct 2024
From Gradient Clipping to Normalization for Heavy Tailed SGD
Florian Hübler
Ilyas Fatkhullin
Niao He
113
10
0
17 Oct 2024
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees
Aleksandar Armacki
Shuhua Yu
Pranay Sharma
Gauri Joshi
Dragana Bajović
D. Jakovetić
S. Kar
117
2
0
17 Oct 2024
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
Joris Postmus
Steven Abreu
LLMSV
370
3
0
09 Oct 2024
Extended convexity and smoothness and their applications in deep learning
Binchuan Qi
Wei Gong
Li Li
117
0
0
08 Oct 2024
An Attention-Based Algorithm for Gravity Adaptation Zone Calibration
Chen Yu
59
0
0
06 Oct 2024
Asymmetry of the Relative Entropy in the Regularization of Empirical Risk Minimization
Francisco Daunas
I. Esnaola
S. Perlaza
H. Vincent Poor
106
3
0
02 Oct 2024
SetPINNs: Set-based Physics-informed Neural Networks
Mayank Nagda
Phil Ostheimer
Thomas Specht
Frank Rhein
Fabian Jirasek
Stephan Mandt
Marius Kloft
Sophie Fellenz
PINN
3DPC
201
1
0
30 Sep 2024
Robust Clustering on High-Dimensional Data with Stochastic Quantization
Anton Kozyriev
Vladimir Norkin
MQ
83
3
0
03 Sep 2024
Hierarchical Learning and Computing over Space-Ground Integrated Networks
Jingyang Zhu
Yuanming Shi
Yong Zhou
Chunxiao Jiang
Linling Kuang
97
2
0
26 Aug 2024
Predicting path-dependent processes by deep learning
Xudong Zheng
Yuecai Han
42
0
0
19 Aug 2024
Online-Score-Aided Federated Learning: Taming the Resource Constraints in Wireless Networks
Md Ferdous Pervej
Minseok Choi
A. Molisch
99
1
0
12 Aug 2024
1
2
3
4
...
16
17
18
Next