ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.04838
  4. Cited By
Optimization Methods for Large-Scale Machine Learning
v1v2v3 (latest)

Optimization Methods for Large-Scale Machine Learning

15 June 2016
Léon Bottou
Frank E. Curtis
J. Nocedal
ArXiv (abs)PDFHTML

Papers citing "Optimization Methods for Large-Scale Machine Learning"

50 / 866 papers shown
Title
AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep
  Neural Networks
AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Neural Networks
Jinrong Guo
Wantao Liu
Wang Wang
Q. Lu
Songlin Hu
Jizhong Han
Ruixuan Li
62
9
0
21 Jan 2019
Stochastic Gradient Descent on a Tree: an Adaptive and Robust Approach
  to Stochastic Convex Optimization
Stochastic Gradient Descent on a Tree: an Adaptive and Robust Approach to Stochastic Convex Optimization
Sattar Vakili
Sudeep Salgia
Qing Zhao
49
7
0
17 Jan 2019
Block-Randomized Stochastic Proximal Gradient for Low-Rank Tensor
  Factorization
Block-Randomized Stochastic Proximal Gradient for Low-Rank Tensor Factorization
Xiao Fu
Shahana Ibrahim
Hoi-To Wai
Cheng Gao
Kejun Huang
134
37
0
16 Jan 2019
Optimization Problems for Machine Learning: A Survey
Optimization Problems for Machine Learning: A Survey
Claudio Gambella
Bissan Ghaddar
Joe Naoum-Sawaya
AI4CE
142
181
0
16 Jan 2019
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU
  Servers
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
A. Koliousis
Pijika Watcharapichat
Matthias Weidlich
Kai Zou
Paolo Costa
Peter R. Pietzuch
65
70
0
08 Jan 2019
SGD Converges to Global Minimum in Deep Learning via Star-convex Path
SGD Converges to Global Minimum in Deep Learning via Star-convex Path
Yi Zhou
Junjie Yang
Huishuai Zhang
Yingbin Liang
Vahid Tarokh
79
74
0
02 Jan 2019
Exact Guarantees on the Absence of Spurious Local Minima for
  Non-negative Rank-1 Robust Principal Component Analysis
Exact Guarantees on the Absence of Spurious Local Minima for Non-negative Rank-1 Robust Principal Component Analysis
Salar Fattahi
Somayeh Sojoudi
74
38
0
30 Dec 2018
On Lazy Training in Differentiable Programming
On Lazy Training in Differentiable Programming
Lénaïc Chizat
Edouard Oyallon
Francis R. Bach
111
840
0
19 Dec 2018
An Empirical Model of Large-Batch Training
An Empirical Model of Large-Batch Training
Sam McCandlish
Jared Kaplan
Dario Amodei
OpenAI Dota Team
76
280
0
14 Dec 2018
Gradient Descent Happens in a Tiny Subspace
Gradient Descent Happens in a Tiny Subspace
Guy Gur-Ari
Daniel A. Roberts
Ethan Dyer
105
234
0
12 Dec 2018
Layer-Parallel Training of Deep Residual Neural Networks
Layer-Parallel Training of Deep Residual Neural Networks
Stefanie Günther
Lars Ruthotto
J. Schroder
E. Cyr
N. Gauger
90
90
0
11 Dec 2018
Universal Adversarial Training
Universal Adversarial Training
A. Mendrik
Mahyar Najibi
Zheng Xu
John P. Dickerson
L. Davis
Tom Goldstein
AAMLOOD
102
190
0
27 Nov 2018
Forward Stability of ResNet and Its Variants
Forward Stability of ResNet and Its Variants
Linan Zhang
Hayden Schaeffer
121
48
0
24 Nov 2018
Parallel sequential Monte Carlo for stochastic gradient-free nonconvex
  optimization
Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization
Ömer Deniz Akyildiz
Dan Crisan
Joaquín Míguez
67
6
0
23 Nov 2018
A Sufficient Condition for Convergences of Adam and RMSProp
A Sufficient Condition for Convergences of Adam and RMSProp
Fangyu Zou
Li Shen
Zequn Jie
Weizhong Zhang
Wei Liu
81
373
0
23 Nov 2018
New Convergence Aspects of Stochastic Gradient Algorithms
New Convergence Aspects of Stochastic Gradient Algorithms
Lam M. Nguyen
Phuong Ha Nguyen
Peter Richtárik
K. Scheinberg
Martin Takáč
Marten van Dijk
141
66
0
10 Nov 2018
A Bayesian Perspective of Statistical Machine Learning for Big Data
A Bayesian Perspective of Statistical Machine Learning for Big Data
R. Sambasivan
Sourish Das
S. Sahu
BDLGP
61
20
0
09 Nov 2018
Double Adaptive Stochastic Gradient Optimization
Double Adaptive Stochastic Gradient Optimization
Rajaditya Mukherjee
Jin Li
Shicheng Chu
Huamin Wang
ODL
53
0
0
06 Nov 2018
Non-Asymptotic Guarantees For Sampling by Stochastic Gradient Descent
Non-Asymptotic Guarantees For Sampling by Stochastic Gradient Descent
Avetik G. Karagulyan
21
1
0
02 Nov 2018
A general system of differential equations to model first order adaptive
  algorithms
A general system of differential equations to model first order adaptive algorithms
André Belotto da Silva
Maxime Gazeau
89
34
0
31 Oct 2018
SpiderBoost and Momentum: Faster Stochastic Variance Reduction
  Algorithms
SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms
Zhe Wang
Kaiyi Ji
Yi Zhou
Yingbin Liang
Vahid Tarokh
ODL
98
82
0
25 Oct 2018
Adaptive Communication Strategies to Achieve the Best Error-Runtime
  Trade-off in Local-Update SGD
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD
Jianyu Wang
Gauri Joshi
FedML
110
232
0
19 Oct 2018
First-order and second-order variants of the gradient descent in a
  unified framework
First-order and second-order variants of the gradient descent in a unified framework
Thomas Pierrot
Nicolas Perrin
Olivier Sigaud
ODL
69
7
0
18 Oct 2018
Fault Tolerance in Iterative-Convergent Machine Learning
Fault Tolerance in Iterative-Convergent Machine Learning
Aurick Qiao
Bryon Aragam
Bingjing Zhang
Eric Xing
76
42
0
17 Oct 2018
Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural
  Networks
Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks
Xiaodong Cui
Wei Zhang
Zoltán Tüske
M. Picheny
ODL
90
91
0
16 Oct 2018
Approximate Fisher Information Matrix to Characterise the Training of
  Deep Neural Networks
Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks
Zhibin Liao
Tom Drummond
Ian Reid
G. Carneiro
80
23
0
16 Oct 2018
Deep Reinforcement Learning
Deep Reinforcement Learning
Yuxi Li
VLMOffRL
194
144
0
15 Oct 2018
Tight Dimension Independent Lower Bound on the Expected Convergence Rate
  for Diminishing Step Sizes in SGD
Tight Dimension Independent Lower Bound on the Expected Convergence Rate for Diminishing Step Sizes in SGD
Phuong Ha Nguyen
Lam M. Nguyen
Marten van Dijk
LRM
75
32
0
10 Oct 2018
Characterization of Convex Objective Functions and Optimal Expected
  Convergence Rates for SGD
Characterization of Convex Objective Functions and Optimal Expected Convergence Rates for SGD
Marten van Dijk
Lam M. Nguyen
Phuong Ha Nguyen
Dzung Phan
95
6
0
09 Oct 2018
Information Geometry of Orthogonal Initializations and Training
Information Geometry of Orthogonal Initializations and Training
Piotr A. Sokól
Il-Su Park
AI4CE
136
17
0
09 Oct 2018
Accelerating Stochastic Gradient Descent Using Antithetic Sampling
Accelerating Stochastic Gradient Descent Using Antithetic Sampling
Jingchang Liu
Linli Xu
49
2
0
07 Oct 2018
Continuous-time Models for Stochastic Optimization Algorithms
Continuous-time Models for Stochastic Optimization Algorithms
Antonio Orvieto
Aurelien Lucchi
119
32
0
05 Oct 2018
Combining Natural Gradient with Hessian Free Methods for Sequence
  Training
Combining Natural Gradient with Hessian Free Methods for Sequence Training
Adnan Haider
P. Woodland
ODL
48
4
0
03 Oct 2018
Large batch size training of neural networks with adversarial training
  and second-order information
Large batch size training of neural networks with adversarial training and second-order information
Z. Yao
A. Gholami
Daiyaan Arfeen
Richard Liaw
Joseph E. Gonzalez
Kurt Keutzer
Michael W. Mahoney
ODL
96
42
0
02 Oct 2018
Mini-batch Serialization: CNN Training with Inter-layer Data Reuse
Mini-batch Serialization: CNN Training with Inter-layer Data Reuse
Sangkug Lym
Armand Behroozi
W. Wen
Ge Li
Yongkee Kwon
M. Erez
41
26
0
30 Sep 2018
A fast quasi-Newton-type method for large-scale stochastic optimisation
A fast quasi-Newton-type method for large-scale stochastic optimisation
A. Wills
Carl Jidling
Thomas B. Schon
ODL
64
7
0
29 Sep 2018
Fluctuation-dissipation relations for stochastic gradient descent
Fluctuation-dissipation relations for stochastic gradient descent
Sho Yaida
121
75
0
28 Sep 2018
Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
Yuejie Chi
Yue M. Lu
Yuxin Chen
75
427
0
25 Sep 2018
Predictive Collective Variable Discovery with Deep Bayesian Models
Predictive Collective Variable Discovery with Deep Bayesian Models
M. Schöberl
N. Zabaras
P. Koutsourelakis
66
34
0
18 Sep 2018
MotherNets: Rapid Deep Ensemble Learning
MotherNets: Rapid Deep Ensemble Learning
Abdul Wasay
Brian Hentschel
Yuze Liao
Sanyuan Chen
Stratos Idreos
58
35
0
12 Sep 2018
MDCN: Multi-Scale, Deep Inception Convolutional Neural Networks for
  Efficient Object Detection
MDCN: Multi-Scale, Deep Inception Convolutional Neural Networks for Efficient Object Detection
Wenchi Ma
Yuanwei Wu
Zongbo Wang
Guanghui Wang
ObjD
73
25
0
06 Sep 2018
Compositional Stochastic Average Gradient for Machine Learning and
  Related Applications
Compositional Stochastic Average Gradient for Machine Learning and Related Applications
Tsung-Yu Hsieh
Y. El-Manzalawy
Yiwei Sun
Vasant Honavar
44
1
0
04 Sep 2018
Distributed Nonconvex Constrained Optimization over Time-Varying
  Digraphs
Distributed Nonconvex Constrained Optimization over Time-Varying Digraphs
G. Scutari
Ying Sun
100
176
0
04 Sep 2018
Sparsity in Deep Neural Networks - An Empirical Investigation with
  TensorQuant
Sparsity in Deep Neural Networks - An Empirical Investigation with TensorQuant
D. Loroch
Franz-Josef Pfreundt
Norbert Wehn
J. Keuper
46
5
0
27 Aug 2018
Cooperative SGD: A unified Framework for the Design and Analysis of
  Communication-Efficient SGD Algorithms
Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms
Jianyu Wang
Gauri Joshi
196
350
0
22 Aug 2018
Backtracking gradient descent method for general $C^1$ functions, with
  applications to Deep Learning
Backtracking gradient descent method for general C1C^1C1 functions, with applications to Deep Learning
T. Truong
T. H. Nguyen
73
10
0
15 Aug 2018
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex
  Optimization
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization
Xiangyi Chen
Sijia Liu
Ruoyu Sun
Mingyi Hong
101
324
0
08 Aug 2018
Particle Filtering Methods for Stochastic Optimization with Application
  to Large-Scale Empirical Risk Minimization
Particle Filtering Methods for Stochastic Optimization with Application to Large-Scale Empirical Risk Minimization
Bin Liu
66
11
0
23 Jul 2018
Training Neural Networks Using Features Replay
Training Neural Networks Using Features Replay
Zhouyuan Huo
Bin Gu
Heng-Chiao Huang
94
70
0
12 Jul 2018
SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path
  Integrated Differential Estimator
SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator
Cong Fang
C. J. Li
Zhouchen Lin
Tong Zhang
147
580
0
04 Jul 2018
Previous
123...1415161718
Next