Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1606.04838
Cited By
v1
v2
v3 (latest)
Optimization Methods for Large-Scale Machine Learning
15 June 2016
Léon Bottou
Frank E. Curtis
J. Nocedal
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Optimization Methods for Large-Scale Machine Learning"
50 / 866 papers shown
Title
Quasi-Monte Carlo Variational Inference
Alexander K. Buchholz
F. Wenzel
Stephan Mandt
BDL
105
60
0
04 Jul 2018
Trust-Region Algorithms for Training Responses: Machine Learning Methods Using Indefinite Hessian Approximations
Jennifer B. Erway
J. Griffin
Roummel F. Marcia
Riadh Omheni
63
24
0
01 Jul 2018
Algorithms for solving optimization problems arising from deep neural net models: smooth problems
Vyacheslav Kungurtsev
Tomás Pevný
48
6
0
30 Jun 2018
Random Shuffling Beats SGD after Finite Epochs
Jeff Z. HaoChen
S. Sra
98
99
0
26 Jun 2018
Laplacian Smoothing Gradient Descent
Stanley Osher
Bao Wang
Penghang Yin
Xiyang Luo
Farzin Barekat
Minh Pham
A. Lin
ODL
113
43
0
17 Jun 2018
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors
Atsushi Nitanda
Taiji Suzuki
77
10
0
14 Jun 2018
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam
Mohammad Emtiyaz Khan
Didrik Nielsen
Voot Tangkaratt
Wu Lin
Y. Gal
Akash Srivastava
ODL
200
271
0
13 Jun 2018
When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models?
Tengyu Xu
Yi Zhou
Kaiyi Ji
Yingbin Liang
90
19
0
12 Jun 2018
Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis
Thomas George
César Laurent
Xavier Bouthillier
Nicolas Ballas
Pascal Vincent
ODL
115
156
0
11 Jun 2018
A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation
Jalaj Bhandari
Daniel Russo
Raghav Singal
115
340
0
06 Jun 2018
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes
Rachel A. Ward
Xiaoxia Wu
Léon Bottou
ODL
115
369
0
05 Jun 2018
Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
Mor Shpigel Nacson
Nathan Srebro
Daniel Soudry
FedML
MLT
102
102
0
05 Jun 2018
Backdrop: Stochastic Backpropagation
Siavash Golkar
Kyle Cranmer
52
2
0
04 Jun 2018
Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients
Sai Praneeth Karimireddy
Sebastian U. Stich
Martin Jaggi
86
52
0
01 Jun 2018
Accelerating Incremental Gradient Optimization with Curvature Information
Hoi-To Wai
Wei Shi
César A. Uribe
A. Nedić
Anna Scaglione
40
12
0
31 May 2018
DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation
Jimmy Wu
Bolei Zhou
D. Peck
S. Hsieh
V. Dialani
Lester W. Mackey
Genevieve Patterson
FAtt
MedIm
71
24
0
31 May 2018
Bayesian Learning with Wasserstein Barycenters
Julio D. Backhoff Veraguas
J. Fontbona
Gonzalo Rios
Felipe A. Tobar
64
31
0
28 May 2018
Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes
Loucas Pillaud-Vivien
Alessandro Rudi
Francis R. Bach
179
103
0
25 May 2018
Stochastic algorithms with descent guarantees for ICA
Pierre Ablin
Alexandre Gramfort
J. Cardoso
Francis R. Bach
CML
32
7
0
25 May 2018
LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning
Tianyi Chen
G. Giannakis
Tao Sun
W. Yin
60
299
0
25 May 2018
LMKL-Net: A Fast Localized Multiple Kernel Learning Solver via Deep Neural Networks
Ziming Zhang
ODL
26
1
0
22 May 2018
Stochastic modified equations for the asynchronous stochastic gradient descent
Jing An
Jian-wei Lu
Lexing Ying
77
79
0
21 May 2018
On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
Xiaoyun Li
Francesco Orabona
89
299
0
21 May 2018
Parallel and Distributed Successive Convex Approximation Methods for Big-Data Optimization
G. Scutari
Ying Sun
105
64
0
17 May 2018
Decoupled Parallel Backpropagation with Convergence Guarantee
Zhouyuan Huo
Bin Gu
Qian Yang
Heng-Chiao Huang
98
97
0
27 Apr 2018
Revisiting Small Batch Training for Deep Neural Networks
Dominic Masters
Carlo Luschi
ODL
83
671
0
20 Apr 2018
Constant Step Size Stochastic Gradient Descent for Probabilistic Modeling
Dmitry Babichev
Francis R. Bach
62
9
0
16 Apr 2018
Sequence Training of DNN Acoustic Models With Natural Gradient
Adnan Haider
P. Woodland
41
7
0
06 Apr 2018
A Constant Step Stochastic Douglas-Rachford Algorithm with Application to Non Separable Regularizations
Adil Salim
Pascal Bianchi
W. Hachem
72
2
0
03 Apr 2018
Training Tips for the Transformer Model
Martin Popel
Ondrej Bojar
110
312
0
01 Apr 2018
Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates
Arnulf Jentzen
Philippe von Wurstemberger
101
31
0
22 Mar 2018
Group Normalization
Yuxin Wu
Kaiming He
261
3,686
0
22 Mar 2018
Efficient FPGA Implementation of Conjugate Gradient Methods for Laplacian System using HLS
Sahithi Rampalli
N. Sehgal
Ishita Bindlish
Tanya Tyagi
Pawan Kumar
33
4
0
10 Mar 2018
A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex Optimization
Andre Milzarek
X. Xiao
Shicong Cen
Zaiwen Wen
M. Ulbrich
66
36
0
09 Mar 2018
WNGrad: Learn the Learning Rate in Gradient Descent
Xiaoxia Wu
Rachel A. Ward
Léon Bottou
70
87
0
07 Mar 2018
DAGs with NO TEARS: Continuous Optimization for Structure Learning
Xun Zheng
Bryon Aragam
Pradeep Ravikumar
Eric Xing
NoLa
CML
OffRL
113
953
0
04 Mar 2018
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD
Sanghamitra Dutta
Gauri Joshi
Soumyadip Ghosh
Parijat Dube
P. Nagpurkar
82
198
0
03 Mar 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
87
713
0
26 Feb 2018
GPU Accelerated Sub-Sampled Newton's Method
Sudhir B. Kylasa
Farbod Roosta-Khorasani
Michael W. Mahoney
A. Grama
ODL
79
8
0
26 Feb 2018
Complex-valued Neural Networks with Non-parametric Activation Functions
Simone Scardapane
S. Van Vaerenbergh
Amir Hussain
A. Uncini
81
84
0
22 Feb 2018
Spurious Valleys in Two-layer Neural Network Optimization Landscapes
Luca Venturi
Afonso S. Bandeira
Joan Bruna
97
75
0
18 Feb 2018
Convergence of Online Mirror Descent
Yunwen Lei
Ding-Xuan Zhou
60
21
0
18 Feb 2018
Stochastic quasi-Newton with adaptive step lengths for large-scale problems
A. Wills
Thomas B. Schon
63
9
0
12 Feb 2018
SGD and Hogwild! Convergence Without the Bounded Gradients Assumption
Lam M. Nguyen
Phuong Ha Nguyen
Marten van Dijk
Peter Richtárik
K. Scheinberg
Martin Takáč
113
228
0
11 Feb 2018
Estimating Heterogeneous Consumer Preferences for Restaurants and Travel Time Using Mobile Location Data
Susan Athey
David M. Blei
Rob Donnelly
Francisco J. R. Ruiz
Tobias Schmidt
43
66
0
22 Jan 2018
When Does Stochastic Gradient Algorithm Work Well?
Lam M. Nguyen
Nam H. Nguyen
Dzung Phan
Jayant Kalagnanam
K. Scheinberg
86
15
0
18 Jan 2018
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning
Amith R. Mamidala
Georgios Kollias
C. Ward
F. Artico
78
20
0
11 Jan 2018
Gradient-based Optimization for Regression in the Functional Tensor-Train Format
Alex A. Gorodetsky
J. Jakeman
76
34
0
03 Jan 2018
A Stochastic Trust Region Algorithm Based on Careful Step Normalization
Frank E. Curtis
K. Scheinberg
R. Shi
75
45
0
29 Dec 2017
Geometrical Insights for Implicit Generative Modeling
Léon Bottou
Martín Arjovsky
David Lopez-Paz
Maxime Oquab
75
50
0
21 Dec 2017
Previous
1
2
3
...
15
16
17
18
Next