ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.10762
  4. Cited By
AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on
  the Fly

AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly

22 May 2021
Yuchen Jin
Dinesh Manocha
Liangyu Zhao
Yibo Zhu
Chuanxiong Guo
Marco Canini
Arvind Krishnamurthy
ArXivPDFHTML

Papers citing "AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly"

41 / 41 papers shown
Title
Provably Efficient Online Hyperparameter Optimization with
  Population-Based Bandits
Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits
Jack Parker-Holder
Vu Nguyen
Stephen J. Roberts
OffRL
103
85
0
06 Feb 2020
Learning and Generalization in Overparameterized Neural Networks, Going
  Beyond Two Layers
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Zeyuan Allen-Zhu
Yuanzhi Li
Yingyu Liang
MLT
175
772
0
12 Nov 2018
A Convergence Theory for Deep Learning via Over-Parameterization
A Convergence Theory for Deep Learning via Over-Parameterization
Zeyuan Allen-Zhu
Yuanzhi Li
Zhao Song
AI4CE
ODL
234
1,461
0
09 Nov 2018
A Closer Look at Deep Learning Heuristics: Learning rate restarts,
  Warmup and Distillation
A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
Akhilesh Deepak Gotmare
N. Keskar
Caiming Xiong
R. Socher
ODL
61
276
0
29 Oct 2018
A System for Massively Parallel Hyperparameter Tuning
A System for Massively Parallel Hyperparameter Tuning
Liam Li
Kevin Jamieson
Afshin Rostamizadeh
Ekaterina Gonina
Moritz Hardt
Benjamin Recht
Ameet Talwalkar
60
379
0
13 Oct 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.6K
94,511
0
11 Oct 2018
Towards Automated Deep Learning: Efficient Joint Neural Architecture and
  Hyperparameter Search
Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search
Arber Zela
Aaron Klein
Stefan Falkner
Frank Hutter
67
161
0
18 Jul 2018
BOHB: Robust and Efficient Hyperparameter Optimization at Scale
BOHB: Robust and Efficient Hyperparameter Optimization at Scale
Stefan Falkner
Aaron Klein
Frank Hutter
BDL
193
1,093
0
04 Jul 2018
Neural Network Acceptability Judgments
Neural Network Acceptability Judgments
Alex Warstadt
Amanpreet Singh
Samuel R. Bowman
219
1,406
0
31 May 2018
Step Size Matters in Deep Learning
Step Size Matters in Deep Learning
Kamil Nar
S. Shankar Sastry
23
42
0
22 May 2018
A disciplined approach to neural network hyper-parameters: Part 1 --
  learning rate, batch size, momentum, and weight decay
A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay
L. Smith
271
1,028
0
26 Mar 2018
Flipout: Efficient Pseudo-Independent Weight Perturbations on
  Mini-Batches
Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches
Yeming Wen
Paul Vicol
Jimmy Ba
Dustin Tran
Roger C. Grosse
BDL
43
310
0
12 Mar 2018
Understanding Short-Horizon Bias in Stochastic Meta-Optimization
Understanding Short-Horizon Bias in Stochastic Meta-Optimization
Yuhuai Wu
Mengye Ren
Renjie Liao
Roger C. Grosse
82
137
0
06 Mar 2018
Visualizing the Loss Landscape of Neural Nets
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
240
1,885
0
28 Dec 2017
Population Based Training of Neural Networks
Population Based Training of Neural Networks
Max Jaderberg
Valentin Dalibard
Simon Osindero
Wojciech M. Czarnecki
Jeff Donahue
...
Tim Green
Iain Dunning
Karen Simonyan
Chrisantha Fernando
Koray Kavukcuoglu
69
740
0
27 Nov 2017
Three Factors Influencing Minima in SGD
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
76
463
0
13 Nov 2017
Mixed Precision Training
Mixed Precision Training
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
...
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
149
1,792
0
10 Oct 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
644
130,942
0
12 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
120
3,675
0
08 Jun 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through
  Inference
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams
Nikita Nangia
Samuel R. Bowman
507
4,473
0
18 Apr 2017
Online Learning Rate Adaptation with Hypergradient Descent
Online Learning Rate Adaptation with Hypergradient Descent
A. G. Baydin
R. Cornish
David Martínez-Rubio
Mark Schmidt
Frank Wood
ODL
69
247
0
14 Mar 2017
Forward and Reverse Gradient-Based Hyperparameter Optimization
Forward and Reverse Gradient-Based Hyperparameter Optimization
Luca Franceschi
Michele Donini
P. Frasconi
Massimiliano Pontil
207
416
0
06 Mar 2017
How to Escape Saddle Points Efficiently
How to Escape Saddle Points Efficiently
Chi Jin
Rong Ge
Praneeth Netrapalli
Sham Kakade
Michael I. Jordan
ODL
209
835
0
02 Mar 2017
SGDR: Stochastic Gradient Descent with Warm Restarts
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Frank Hutter
ODL
288
8,091
0
13 Aug 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
241
8,113
0
16 Jun 2016
Deep Learning without Poor Local Minima
Deep Learning without Poor Local Minima
Kenji Kawaguchi
ODL
209
922
0
23 May 2016
Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization
Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization
Lisha Li
Kevin Jamieson
Giulia DeSalvo
Afshin Rostamizadeh
Ameet Talwalkar
213
2,321
0
21 Mar 2016
Identity Mappings in Deep Residual Networks
Identity Mappings in Deep Residual Networks
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
342
10,172
0
16 Mar 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.1K
193,426
0
10 Dec 2015
Aligning Books and Movies: Towards Story-like Visual Explanations by
  Watching Movies and Reading Books
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
Yukun Zhu
Ryan Kiros
R. Zemel
Ruslan Salakhutdinov
R. Urtasun
Antonio Torralba
Sanja Fidler
118
2,544
0
22 Jun 2015
Cyclical Learning Rates for Training Neural Networks
Cyclical Learning Rates for Training Neural Networks
L. Smith
ODL
183
2,517
0
03 Jun 2015
Non-stochastic Best Arm Identification and Hyperparameter Optimization
Non-stochastic Best Arm Identification and Hyperparameter Optimization
Kevin Jamieson
Ameet Talwalkar
189
579
0
27 Feb 2015
Probabilistic Line Searches for Stochastic Optimization
Probabilistic Line Searches for Stochastic Optimization
Maren Mahsereci
Philipp Hennig
ODL
62
126
0
10 Feb 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
1.6K
149,842
0
22 Dec 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
1.5K
100,213
0
04 Sep 2014
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
1.6K
39,472
0
01 Sep 2014
Freeze-Thaw Bayesian Optimization
Freeze-Thaw Bayesian Optimization
Kevin Swersky
Jasper Snoek
Ryan P. Adams
77
269
0
16 Jun 2014
One weird trick for parallelizing convolutional neural networks
One weird trick for parallelizing convolutional neural networks
A. Krizhevsky
GNN
88
1,298
0
23 Apr 2014
ADADELTA: An Adaptive Learning Rate Method
ADADELTA: An Adaptive Learning Rate Method
Matthew D. Zeiler
ODL
132
6,623
0
22 Dec 2012
Practical Bayesian Optimization of Machine Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms
Jasper Snoek
Hugo Larochelle
Ryan P. Adams
333
7,923
0
13 Jun 2012
No More Pesky Learning Rates
No More Pesky Learning Rates
Tom Schaul
Sixin Zhang
Yann LeCun
123
477
0
06 Jun 2012
1