ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.04838
  4. Cited By
Optimization Methods for Large-Scale Machine Learning
v1v2v3 (latest)

Optimization Methods for Large-Scale Machine Learning

15 June 2016
Léon Bottou
Frank E. Curtis
J. Nocedal
ArXiv (abs)PDFHTML

Papers citing "Optimization Methods for Large-Scale Machine Learning"

50 / 866 papers shown
Title
Quasi-Monte Carlo Variational Inference
Quasi-Monte Carlo Variational Inference
Alexander K. Buchholz
F. Wenzel
Stephan Mandt
BDL
105
60
0
04 Jul 2018
Trust-Region Algorithms for Training Responses: Machine Learning Methods
  Using Indefinite Hessian Approximations
Trust-Region Algorithms for Training Responses: Machine Learning Methods Using Indefinite Hessian Approximations
Jennifer B. Erway
J. Griffin
Roummel F. Marcia
Riadh Omheni
63
24
0
01 Jul 2018
Algorithms for solving optimization problems arising from deep neural
  net models: smooth problems
Algorithms for solving optimization problems arising from deep neural net models: smooth problems
Vyacheslav Kungurtsev
Tomás Pevný
48
6
0
30 Jun 2018
Random Shuffling Beats SGD after Finite Epochs
Random Shuffling Beats SGD after Finite Epochs
Jeff Z. HaoChen
S. Sra
98
99
0
26 Jun 2018
Laplacian Smoothing Gradient Descent
Laplacian Smoothing Gradient Descent
Stanley Osher
Bao Wang
Penghang Yin
Xiyang Luo
Farzin Barekat
Minh Pham
A. Lin
ODL
113
43
0
17 Jun 2018
Stochastic Gradient Descent with Exponential Convergence Rates of
  Expected Classification Errors
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors
Atsushi Nitanda
Taiji Suzuki
77
10
0
14 Jun 2018
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam
Mohammad Emtiyaz Khan
Didrik Nielsen
Voot Tangkaratt
Wu Lin
Y. Gal
Akash Srivastava
ODL
200
271
0
13 Jun 2018
When Will Gradient Methods Converge to Max-margin Classifier under ReLU
  Models?
When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models?
Tengyu Xu
Yi Zhou
Kaiyi Ji
Yingbin Liang
90
19
0
12 Jun 2018
Fast Approximate Natural Gradient Descent in a Kronecker-factored
  Eigenbasis
Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis
Thomas George
César Laurent
Xavier Bouthillier
Nicolas Ballas
Pascal Vincent
ODL
115
156
0
11 Jun 2018
A Finite Time Analysis of Temporal Difference Learning With Linear
  Function Approximation
A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation
Jalaj Bhandari
Daniel Russo
Raghav Singal
115
340
0
06 Jun 2018
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes
Rachel A. Ward
Xiaoxia Wu
Léon Bottou
ODL
115
369
0
05 Jun 2018
Stochastic Gradient Descent on Separable Data: Exact Convergence with a
  Fixed Learning Rate
Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
Mor Shpigel Nacson
Nathan Srebro
Daniel Soudry
FedMLMLT
102
102
0
05 Jun 2018
Backdrop: Stochastic Backpropagation
Backdrop: Stochastic Backpropagation
Siavash Golkar
Kyle Cranmer
52
2
0
04 Jun 2018
Global linear convergence of Newton's method without strong-convexity or
  Lipschitz gradients
Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients
Sai Praneeth Karimireddy
Sebastian U. Stich
Martin Jaggi
86
52
0
01 Jun 2018
Accelerating Incremental Gradient Optimization with Curvature
  Information
Accelerating Incremental Gradient Optimization with Curvature Information
Hoi-To Wai
Wei Shi
César A. Uribe
A. Nedić
Anna Scaglione
40
12
0
31 May 2018
DeepMiner: Discovering Interpretable Representations for Mammogram
  Classification and Explanation
DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation
Jimmy Wu
Bolei Zhou
D. Peck
S. Hsieh
V. Dialani
Lester W. Mackey
Genevieve Patterson
FAttMedIm
71
24
0
31 May 2018
Bayesian Learning with Wasserstein Barycenters
Bayesian Learning with Wasserstein Barycenters
Julio D. Backhoff Veraguas
J. Fontbona
Gonzalo Rios
Felipe A. Tobar
64
31
0
28 May 2018
Statistical Optimality of Stochastic Gradient Descent on Hard Learning
  Problems through Multiple Passes
Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes
Loucas Pillaud-Vivien
Alessandro Rudi
Francis R. Bach
179
103
0
25 May 2018
Stochastic algorithms with descent guarantees for ICA
Stochastic algorithms with descent guarantees for ICA
Pierre Ablin
Alexandre Gramfort
J. Cardoso
Francis R. Bach
CML
32
7
0
25 May 2018
LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed
  Learning
LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning
Tianyi Chen
G. Giannakis
Tao Sun
W. Yin
60
299
0
25 May 2018
LMKL-Net: A Fast Localized Multiple Kernel Learning Solver via Deep
  Neural Networks
LMKL-Net: A Fast Localized Multiple Kernel Learning Solver via Deep Neural Networks
Ziming Zhang
ODL
23
1
0
22 May 2018
Stochastic modified equations for the asynchronous stochastic gradient
  descent
Stochastic modified equations for the asynchronous stochastic gradient descent
Jing An
Jian-wei Lu
Lexing Ying
77
79
0
21 May 2018
On the Convergence of Stochastic Gradient Descent with Adaptive
  Stepsizes
On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
Xiaoyun Li
Francesco Orabona
89
299
0
21 May 2018
Parallel and Distributed Successive Convex Approximation Methods for
  Big-Data Optimization
Parallel and Distributed Successive Convex Approximation Methods for Big-Data Optimization
G. Scutari
Ying Sun
105
64
0
17 May 2018
Decoupled Parallel Backpropagation with Convergence Guarantee
Decoupled Parallel Backpropagation with Convergence Guarantee
Zhouyuan Huo
Bin Gu
Qian Yang
Heng-Chiao Huang
98
97
0
27 Apr 2018
Revisiting Small Batch Training for Deep Neural Networks
Revisiting Small Batch Training for Deep Neural Networks
Dominic Masters
Carlo Luschi
ODL
83
671
0
20 Apr 2018
Constant Step Size Stochastic Gradient Descent for Probabilistic
  Modeling
Constant Step Size Stochastic Gradient Descent for Probabilistic Modeling
Dmitry Babichev
Francis R. Bach
62
9
0
16 Apr 2018
Sequence Training of DNN Acoustic Models With Natural Gradient
Sequence Training of DNN Acoustic Models With Natural Gradient
Adnan Haider
P. Woodland
41
7
0
06 Apr 2018
A Constant Step Stochastic Douglas-Rachford Algorithm with Application
  to Non Separable Regularizations
A Constant Step Stochastic Douglas-Rachford Algorithm with Application to Non Separable Regularizations
Adil Salim
Pascal Bianchi
W. Hachem
72
2
0
03 Apr 2018
Training Tips for the Transformer Model
Training Tips for the Transformer Model
Martin Popel
Ondrej Bojar
110
312
0
01 Apr 2018
Lower error bounds for the stochastic gradient descent optimization
  algorithm: Sharp convergence rates for slowly and fast decaying learning
  rates
Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates
Arnulf Jentzen
Philippe von Wurstemberger
101
31
0
22 Mar 2018
Group Normalization
Group Normalization
Yuxin Wu
Kaiming He
261
3,686
0
22 Mar 2018
Efficient FPGA Implementation of Conjugate Gradient Methods for
  Laplacian System using HLS
Efficient FPGA Implementation of Conjugate Gradient Methods for Laplacian System using HLS
Sahithi Rampalli
N. Sehgal
Ishita Bindlish
Tanya Tyagi
Pawan Kumar
33
4
0
10 Mar 2018
A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex
  Optimization
A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex Optimization
Andre Milzarek
X. Xiao
Shicong Cen
Zaiwen Wen
M. Ulbrich
66
36
0
09 Mar 2018
WNGrad: Learn the Learning Rate in Gradient Descent
WNGrad: Learn the Learning Rate in Gradient Descent
Xiaoxia Wu
Rachel A. Ward
Léon Bottou
70
87
0
07 Mar 2018
DAGs with NO TEARS: Continuous Optimization for Structure Learning
DAGs with NO TEARS: Continuous Optimization for Structure Learning
Xun Zheng
Bryon Aragam
Pradeep Ravikumar
Eric Xing
NoLaCMLOffRL
113
953
0
04 Mar 2018
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in
  Distributed SGD
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD
Sanghamitra Dutta
Gauri Joshi
Soumyadip Ghosh
Parijat Dube
P. Nagpurkar
82
198
0
03 Mar 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth
  Concurrency Analysis
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
87
713
0
26 Feb 2018
GPU Accelerated Sub-Sampled Newton's Method
GPU Accelerated Sub-Sampled Newton's Method
Sudhir B. Kylasa
Farbod Roosta-Khorasani
Michael W. Mahoney
A. Grama
ODL
79
8
0
26 Feb 2018
Complex-valued Neural Networks with Non-parametric Activation Functions
Complex-valued Neural Networks with Non-parametric Activation Functions
Simone Scardapane
S. Van Vaerenbergh
Amir Hussain
A. Uncini
81
84
0
22 Feb 2018
Spurious Valleys in Two-layer Neural Network Optimization Landscapes
Spurious Valleys in Two-layer Neural Network Optimization Landscapes
Luca Venturi
Afonso S. Bandeira
Joan Bruna
97
75
0
18 Feb 2018
Convergence of Online Mirror Descent
Convergence of Online Mirror Descent
Yunwen Lei
Ding-Xuan Zhou
60
21
0
18 Feb 2018
Stochastic quasi-Newton with adaptive step lengths for large-scale
  problems
Stochastic quasi-Newton with adaptive step lengths for large-scale problems
A. Wills
Thomas B. Schon
63
9
0
12 Feb 2018
SGD and Hogwild! Convergence Without the Bounded Gradients Assumption
SGD and Hogwild! Convergence Without the Bounded Gradients Assumption
Lam M. Nguyen
Phuong Ha Nguyen
Marten van Dijk
Peter Richtárik
K. Scheinberg
Martin Takáč
113
228
0
11 Feb 2018
Estimating Heterogeneous Consumer Preferences for Restaurants and Travel
  Time Using Mobile Location Data
Estimating Heterogeneous Consumer Preferences for Restaurants and Travel Time Using Mobile Location Data
Susan Athey
David M. Blei
Rob Donnelly
Francisco J. R. Ruiz
Tobias Schmidt
43
66
0
22 Jan 2018
When Does Stochastic Gradient Algorithm Work Well?
When Does Stochastic Gradient Algorithm Work Well?
Lam M. Nguyen
Nam H. Nguyen
Dzung Phan
Jayant Kalagnanam
K. Scheinberg
86
15
0
18 Jan 2018
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for
  scaling Deep Learning
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning
Amith R. Mamidala
Georgios Kollias
C. Ward
F. Artico
78
20
0
11 Jan 2018
Gradient-based Optimization for Regression in the Functional
  Tensor-Train Format
Gradient-based Optimization for Regression in the Functional Tensor-Train Format
Alex A. Gorodetsky
J. Jakeman
76
34
0
03 Jan 2018
A Stochastic Trust Region Algorithm Based on Careful Step Normalization
A Stochastic Trust Region Algorithm Based on Careful Step Normalization
Frank E. Curtis
K. Scheinberg
R. Shi
75
45
0
29 Dec 2017
Geometrical Insights for Implicit Generative Modeling
Geometrical Insights for Implicit Generative Modeling
Léon Bottou
Martín Arjovsky
David Lopez-Paz
Maxime Oquab
75
50
0
21 Dec 2017
Previous
123...15161718
Next