ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.04838
  4. Cited By
Optimization Methods for Large-Scale Machine Learning
v1v2v3 (latest)

Optimization Methods for Large-Scale Machine Learning

15 June 2016
Léon Bottou
Frank E. Curtis
J. Nocedal
ArXiv (abs)PDFHTML

Papers citing "Optimization Methods for Large-Scale Machine Learning"

50 / 866 papers shown
Title
Snake: a Stochastic Proximal Gradient Algorithm for Regularized Problems
  over Large Graphs
Snake: a Stochastic Proximal Gradient Algorithm for Regularized Problems over Large Graphs
Adil Salim
Pascal Bianchi
W. Hachem
65
17
0
19 Dec 2017
The Power of Interpolation: Understanding the Effectiveness of SGD in
  Modern Over-parametrized Learning
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
117
291
0
18 Dec 2017
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural
  Networks
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
Shankar Krishnan
Ying Xiao
Rif A. Saurous
ODL
45
20
0
08 Dec 2017
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Aditya Devarakonda
Maxim Naumov
M. Garland
ODL
112
136
0
06 Dec 2017
A two-dimensional decomposition approach for matrix completion through
  gossip
A two-dimensional decomposition approach for matrix completion through gossip
Mukul Bhutani
Bamdev Mishra
26
0
0
21 Nov 2017
Convergent Block Coordinate Descent for Training Tikhonov Regularized
  Deep Neural Networks
Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks
Ziming Zhang
M. Brand
59
71
0
20 Nov 2017
BPGrad: Towards Global Optimality in Deep Learning via Branch and
  Pruning
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning
Ziming Zhang
Yuanwei Wu
Guanghui Wang
ODL
65
28
0
19 Nov 2017
Accelerated Method for Stochastic Composition Optimization with
  Nonsmooth Regularization
Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization
Zhouyuan Huo
Bin Gu
Ji Liu
Heng-Chiao Huang
93
51
0
10 Nov 2017
SHOPPER: A Probabilistic Model of Consumer Choice with Substitutes and
  Complements
SHOPPER: A Probabilistic Model of Consumer Choice with Substitutes and Complements
Francisco J. R. Ruiz
Susan Athey
David M. Blei
415
85
0
09 Nov 2017
Analysis of Biased Stochastic Gradient Descent Using Sequential
  Semidefinite Programs
Analysis of Biased Stochastic Gradient Descent Using Sequential Semidefinite Programs
Bin Hu
Peter M. Seiler
Laurent Lessard
121
40
0
03 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
133
996
0
01 Nov 2017
Adaptive Sampling Strategies for Stochastic Optimization
Adaptive Sampling Strategies for Stochastic Optimization
Raghu Bollapragada
R. Byrd
J. Nocedal
54
116
0
30 Oct 2017
On the role of synaptic stochasticity in training low-precision neural
  networks
On the role of synaptic stochasticity in training low-precision neural networks
Carlo Baldassi
Federica Gerace
H. Kappen
Carlo Lucibello
Luca Saglietti
Enzo Tartaglione
R. Zecchina
55
23
0
26 Oct 2017
Avoiding Communication in Proximal Methods for Convex Optimization
  Problems
Avoiding Communication in Proximal Methods for Convex Optimization Problems
Saeed Soori
Aditya Devarakonda
J. Demmel
Mert Gurbuzbalaban
M. Dehnavi
34
7
0
24 Oct 2017
Smart "Predict, then Optimize"
Smart "Predict, then Optimize"
Adam N. Elmachtoub
Paul Grigas
104
613
0
22 Oct 2017
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text
  Recognition
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition
Chun Yang
Xu-Cheng Yin
Zejun Li
Jianwei Wu
Chunchao Guo
Hongfa Wang
Lei Xiao
44
10
0
10 Oct 2017
Training Feedforward Neural Networks with Standard Logistic Activations
  is Feasible
Training Feedforward Neural Networks with Standard Logistic Activations is Feasible
Emanuele Sansone
F. D. De Natale
29
4
0
03 Oct 2017
How regularization affects the critical points in linear networks
How regularization affects the critical points in linear networks
Amirhossein Taghvaei
Jin-Won Kim
P. Mehta
77
13
0
27 Sep 2017
Feedforward and Recurrent Neural Networks Backward Propagation and
  Hessian in Matrix Form
Feedforward and Recurrent Neural Networks Backward Propagation and Hessian in Matrix Form
Maxim Naumov
82
9
0
16 Sep 2017
ClickBAIT: Click-based Accelerated Incremental Training of Convolutional
  Neural Networks
ClickBAIT: Click-based Accelerated Incremental Training of Convolutional Neural Networks
Ervin Teng
João Diogo Falcão
Bob Iannucci
62
14
0
15 Sep 2017
The Impact of Local Geometry and Batch Size on Stochastic Gradient
  Descent for Nonconvex Problems
The Impact of Local Geometry and Batch Size on Stochastic Gradient Descent for Nonconvex Problems
V. Patel
MLT
73
8
0
14 Sep 2017
Second-Order Optimization for Non-Convex Machine Learning: An Empirical
  Study
Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study
Peng Xu
Farbod Roosta-Khorasani
Michael W. Mahoney
ODL
84
145
0
25 Aug 2017
Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian
  Information
Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian Information
Peng Xu
Farbod Roosta-Khorasani
Michael W. Mahoney
133
214
0
23 Aug 2017
Super-Convergence: Very Fast Training of Neural Networks Using Large
  Learning Rates
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
L. Smith
Nicholay Topin
AI4CE
117
518
0
23 Aug 2017
Regularizing and Optimizing LSTM Language Models
Regularizing and Optimizing LSTM Language Models
Stephen Merity
N. Keskar
R. Socher
178
1,098
0
07 Aug 2017
On the convergence properties of a $K$-step averaging stochastic
  gradient descent algorithm for nonconvex optimization
On the convergence properties of a KKK-step averaging stochastic gradient descent algorithm for nonconvex optimization
Fan Zhou
Guojing Cong
186
236
0
03 Aug 2017
A Robust Multi-Batch L-BFGS Method for Machine Learning
A Robust Multi-Batch L-BFGS Method for Machine Learning
A. Berahas
Martin Takáč
AAMLODL
111
44
0
26 Jul 2017
Warped Riemannian metrics for location-scale models
Warped Riemannian metrics for location-scale models
Salem Said
Lionel Bombrun
Y. Berthoumieu
76
15
0
22 Jul 2017
Stochastic, Distributed and Federated Optimization for Machine Learning
Stochastic, Distributed and Federated Optimization for Machine Learning
Jakub Konecný
FedML
83
38
0
04 Jul 2017
Optimization Methods for Supervised Machine Learning: From Linear Models
  to Deep Learning
Optimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning
Frank E. Curtis
K. Scheinberg
100
45
0
30 Jun 2017
Efficiency of quantum versus classical annealing in non-convex learning
  problems
Efficiency of quantum versus classical annealing in non-convex learning problems
Carlo Baldassi
R. Zecchina
78
45
0
26 Jun 2017
Faster independent component analysis by preconditioning with Hessian
  approximations
Faster independent component analysis by preconditioning with Hessian approximations
Pierre Ablin
J. Cardoso
Alexandre Gramfort
CML
87
127
0
25 Jun 2017
Collaborative Deep Learning in Fixed Topology Networks
Collaborative Deep Learning in Fixed Topology Networks
Zhanhong Jiang
Aditya Balu
Chinmay Hegde
Soumik Sarkar
FedML
82
181
0
23 Jun 2017
Improved Optimization of Finite Sums with Minibatch Stochastic Variance
  Reduced Proximal Iterations
Improved Optimization of Finite Sums with Minibatch Stochastic Variance Reduced Proximal Iterations
Jialei Wang
Tong Zhang
80
12
0
21 Jun 2017
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
Dong Yin
A. Pananjady
Max Lam
Dimitris Papailiopoulos
Kannan Ramchandran
Peter L. Bartlett
89
11
0
18 Jun 2017
Stochastic Training of Neural Networks via Successive Convex
  Approximations
Stochastic Training of Neural Networks via Successive Convex Approximations
Simone Scardapane
Paolo Di Lorenzo
43
9
0
15 Jun 2017
Proximal Backpropagation
Proximal Backpropagation
Thomas Frerix
Thomas Möllenhoff
Michael Möller
Zorah Lähner
66
31
0
14 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
220
3,692
0
08 Jun 2017
Diminishing Batch Normalization
Diminishing Batch Normalization
Yintai Ma
Diego Klabjan
49
15
0
22 May 2017
EE-Grad: Exploration and Exploitation for Cost-Efficient Mini-Batch SGD
EE-Grad: Exploration and Exploitation for Cost-Efficient Mini-Batch SGD
Mehmet A. Donmez
Maxim Raginsky
A. Singer
FedML
16
0
0
19 May 2017
An Investigation of Newton-Sketch and Subsampled Newton Methods
An Investigation of Newton-Sketch and Subsampled Newton Methods
A. Berahas
Raghu Bollapragada
J. Nocedal
104
114
0
17 May 2017
Efficient Parallel Methods for Deep Reinforcement Learning
Efficient Parallel Methods for Deep Reinforcement Learning
Alfredo V. Clemente
Humberto Nicolás Castejón Martínez
A. Chandra
85
115
0
13 May 2017
Stable Architectures for Deep Neural Networks
Stable Architectures for Deep Neural Networks
E. Haber
Lars Ruthotto
174
736
0
09 May 2017
SEAGLE: Sparsity-Driven Image Reconstruction under Multiple Scattering
SEAGLE: Sparsity-Driven Image Reconstruction under Multiple Scattering
Hsiou-Yuan Liu
Dehong Liu
Hassan Mansour
P. Boufounos
Laura Waller
Ulugbek S. Kamilov
50
77
0
05 May 2017
Bandit Structured Prediction for Neural Sequence-to-Sequence Learning
Bandit Structured Prediction for Neural Sequence-to-Sequence Learning
Julia Kreutzer
Artem Sokolov
Stefan Riezler
85
49
0
21 Apr 2017
Deep Relaxation: partial differential equations for optimizing deep
  neural networks
Deep Relaxation: partial differential equations for optimizing deep neural networks
Pratik Chaudhari
Adam M. Oberman
Stanley Osher
Stefano Soatto
G. Carlier
174
154
0
17 Apr 2017
Inference via low-dimensional couplings
Inference via low-dimensional couplings
Alessio Spantini
Daniele Bigoni
Youssef Marzouk
145
119
0
17 Mar 2017
Sharp Minima Can Generalize For Deep Nets
Sharp Minima Can Generalize For Deep Nets
Laurent Dinh
Razvan Pascanu
Samy Bengio
Yoshua Bengio
ODL
147
774
0
15 Mar 2017
Riemannian stochastic quasi-Newton algorithm with variance reduction and
  its convergence analysis
Riemannian stochastic quasi-Newton algorithm with variance reduction and its convergence analysis
Hiroyuki Kasai
Hiroyuki Sato
Bamdev Mishra
65
22
0
15 Mar 2017
SARAH: A Novel Method for Machine Learning Problems Using Stochastic
  Recursive Gradient
SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient
Lam M. Nguyen
Jie Liu
K. Scheinberg
Martin Takáč
ODL
177
608
0
01 Mar 2017
Previous
123...161718
Next