Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1606.04838
Cited By
v1
v2
v3 (latest)
Optimization Methods for Large-Scale Machine Learning
15 June 2016
Léon Bottou
Frank E. Curtis
J. Nocedal
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Optimization Methods for Large-Scale Machine Learning"
50 / 866 papers shown
Title
AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Neural Networks
Jinrong Guo
Wantao Liu
Wang Wang
Q. Lu
Songlin Hu
Jizhong Han
Ruixuan Li
62
9
0
21 Jan 2019
Stochastic Gradient Descent on a Tree: an Adaptive and Robust Approach to Stochastic Convex Optimization
Sattar Vakili
Sudeep Salgia
Qing Zhao
49
7
0
17 Jan 2019
Block-Randomized Stochastic Proximal Gradient for Low-Rank Tensor Factorization
Xiao Fu
Shahana Ibrahim
Hoi-To Wai
Cheng Gao
Kejun Huang
134
37
0
16 Jan 2019
Optimization Problems for Machine Learning: A Survey
Claudio Gambella
Bissan Ghaddar
Joe Naoum-Sawaya
AI4CE
142
181
0
16 Jan 2019
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
A. Koliousis
Pijika Watcharapichat
Matthias Weidlich
Kai Zou
Paolo Costa
Peter R. Pietzuch
65
70
0
08 Jan 2019
SGD Converges to Global Minimum in Deep Learning via Star-convex Path
Yi Zhou
Junjie Yang
Huishuai Zhang
Yingbin Liang
Vahid Tarokh
79
74
0
02 Jan 2019
Exact Guarantees on the Absence of Spurious Local Minima for Non-negative Rank-1 Robust Principal Component Analysis
Salar Fattahi
Somayeh Sojoudi
74
38
0
30 Dec 2018
On Lazy Training in Differentiable Programming
Lénaïc Chizat
Edouard Oyallon
Francis R. Bach
111
840
0
19 Dec 2018
An Empirical Model of Large-Batch Training
Sam McCandlish
Jared Kaplan
Dario Amodei
OpenAI Dota Team
76
280
0
14 Dec 2018
Gradient Descent Happens in a Tiny Subspace
Guy Gur-Ari
Daniel A. Roberts
Ethan Dyer
105
234
0
12 Dec 2018
Layer-Parallel Training of Deep Residual Neural Networks
Stefanie Günther
Lars Ruthotto
J. Schroder
E. Cyr
N. Gauger
90
90
0
11 Dec 2018
Universal Adversarial Training
A. Mendrik
Mahyar Najibi
Zheng Xu
John P. Dickerson
L. Davis
Tom Goldstein
AAML
OOD
102
190
0
27 Nov 2018
Forward Stability of ResNet and Its Variants
Linan Zhang
Hayden Schaeffer
121
48
0
24 Nov 2018
Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization
Ömer Deniz Akyildiz
Dan Crisan
Joaquín Míguez
67
6
0
23 Nov 2018
A Sufficient Condition for Convergences of Adam and RMSProp
Fangyu Zou
Li Shen
Zequn Jie
Weizhong Zhang
Wei Liu
81
373
0
23 Nov 2018
New Convergence Aspects of Stochastic Gradient Algorithms
Lam M. Nguyen
Phuong Ha Nguyen
Peter Richtárik
K. Scheinberg
Martin Takáč
Marten van Dijk
141
66
0
10 Nov 2018
A Bayesian Perspective of Statistical Machine Learning for Big Data
R. Sambasivan
Sourish Das
S. Sahu
BDL
GP
61
20
0
09 Nov 2018
Double Adaptive Stochastic Gradient Optimization
Rajaditya Mukherjee
Jin Li
Shicheng Chu
Huamin Wang
ODL
53
0
0
06 Nov 2018
Non-Asymptotic Guarantees For Sampling by Stochastic Gradient Descent
Avetik G. Karagulyan
21
1
0
02 Nov 2018
A general system of differential equations to model first order adaptive algorithms
André Belotto da Silva
Maxime Gazeau
89
34
0
31 Oct 2018
SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms
Zhe Wang
Kaiyi Ji
Yi Zhou
Yingbin Liang
Vahid Tarokh
ODL
98
82
0
25 Oct 2018
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD
Jianyu Wang
Gauri Joshi
FedML
110
232
0
19 Oct 2018
First-order and second-order variants of the gradient descent in a unified framework
Thomas Pierrot
Nicolas Perrin
Olivier Sigaud
ODL
69
7
0
18 Oct 2018
Fault Tolerance in Iterative-Convergent Machine Learning
Aurick Qiao
Bryon Aragam
Bingjing Zhang
Eric Xing
76
42
0
17 Oct 2018
Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks
Xiaodong Cui
Wei Zhang
Zoltán Tüske
M. Picheny
ODL
90
91
0
16 Oct 2018
Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks
Zhibin Liao
Tom Drummond
Ian Reid
G. Carneiro
80
23
0
16 Oct 2018
Deep Reinforcement Learning
Yuxi Li
VLM
OffRL
194
144
0
15 Oct 2018
Tight Dimension Independent Lower Bound on the Expected Convergence Rate for Diminishing Step Sizes in SGD
Phuong Ha Nguyen
Lam M. Nguyen
Marten van Dijk
LRM
75
32
0
10 Oct 2018
Characterization of Convex Objective Functions and Optimal Expected Convergence Rates for SGD
Marten van Dijk
Lam M. Nguyen
Phuong Ha Nguyen
Dzung Phan
95
6
0
09 Oct 2018
Information Geometry of Orthogonal Initializations and Training
Piotr A. Sokól
Il-Su Park
AI4CE
136
17
0
09 Oct 2018
Accelerating Stochastic Gradient Descent Using Antithetic Sampling
Jingchang Liu
Linli Xu
49
2
0
07 Oct 2018
Continuous-time Models for Stochastic Optimization Algorithms
Antonio Orvieto
Aurelien Lucchi
119
32
0
05 Oct 2018
Combining Natural Gradient with Hessian Free Methods for Sequence Training
Adnan Haider
P. Woodland
ODL
48
4
0
03 Oct 2018
Large batch size training of neural networks with adversarial training and second-order information
Z. Yao
A. Gholami
Daiyaan Arfeen
Richard Liaw
Joseph E. Gonzalez
Kurt Keutzer
Michael W. Mahoney
ODL
96
42
0
02 Oct 2018
Mini-batch Serialization: CNN Training with Inter-layer Data Reuse
Sangkug Lym
Armand Behroozi
W. Wen
Ge Li
Yongkee Kwon
M. Erez
41
26
0
30 Sep 2018
A fast quasi-Newton-type method for large-scale stochastic optimisation
A. Wills
Carl Jidling
Thomas B. Schon
ODL
64
7
0
29 Sep 2018
Fluctuation-dissipation relations for stochastic gradient descent
Sho Yaida
121
75
0
28 Sep 2018
Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
Yuejie Chi
Yue M. Lu
Yuxin Chen
75
427
0
25 Sep 2018
Predictive Collective Variable Discovery with Deep Bayesian Models
M. Schöberl
N. Zabaras
P. Koutsourelakis
66
34
0
18 Sep 2018
MotherNets: Rapid Deep Ensemble Learning
Abdul Wasay
Brian Hentschel
Yuze Liao
Sanyuan Chen
Stratos Idreos
58
35
0
12 Sep 2018
MDCN: Multi-Scale, Deep Inception Convolutional Neural Networks for Efficient Object Detection
Wenchi Ma
Yuanwei Wu
Zongbo Wang
Guanghui Wang
ObjD
73
25
0
06 Sep 2018
Compositional Stochastic Average Gradient for Machine Learning and Related Applications
Tsung-Yu Hsieh
Y. El-Manzalawy
Yiwei Sun
Vasant Honavar
44
1
0
04 Sep 2018
Distributed Nonconvex Constrained Optimization over Time-Varying Digraphs
G. Scutari
Ying Sun
100
176
0
04 Sep 2018
Sparsity in Deep Neural Networks - An Empirical Investigation with TensorQuant
D. Loroch
Franz-Josef Pfreundt
Norbert Wehn
J. Keuper
46
5
0
27 Aug 2018
Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms
Jianyu Wang
Gauri Joshi
196
350
0
22 Aug 2018
Backtracking gradient descent method for general
C
1
C^1
C
1
functions, with applications to Deep Learning
T. Truong
T. H. Nguyen
73
10
0
15 Aug 2018
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization
Xiangyi Chen
Sijia Liu
Ruoyu Sun
Mingyi Hong
101
324
0
08 Aug 2018
Particle Filtering Methods for Stochastic Optimization with Application to Large-Scale Empirical Risk Minimization
Bin Liu
66
11
0
23 Jul 2018
Training Neural Networks Using Features Replay
Zhouyuan Huo
Bin Gu
Heng-Chiao Huang
94
70
0
12 Jul 2018
SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator
Cong Fang
C. J. Li
Zhouchen Lin
Tong Zhang
147
580
0
04 Jul 2018
Previous
1
2
3
...
14
15
16
17
18
Next