DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule

8 February 2023

Papers citing "DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule"

39 / 39 papers shown

Title
A Hessian-informed hyperparameter optimization for differential learning rate Shiyun Xu Zhiqi Bu Yiliang Zhang Ian Barnett 87 1 0 12 Jan 2025
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate Zhiqi Bu Xiaomeng Jin Bhanukiran Vinzamuri Anil Ramakrishna Kai-Wei Chang Volkan Cevher Mingyi Hong MU 113 10 0 29 Oct 2024
Parameter-free Regret in High Probability with Heavy Tails Jiujia Zhang Ashok Cutkosky 46 20 0 25 Oct 2022
Making SGD Parameter-Free Y. Carmon Oliver Hinder 77 45 0 04 May 2022
Parameter-free Mirror Descent Andrew Jacobsen Ashok Cutkosky 59 34 0 26 Feb 2022
The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance Matthew Faw Isidoros Tziotis Constantine Caramanis Aryan Mokhtari Sanjay Shakkottai Rachel A. Ward 55 59 0 11 Feb 2022
A ConvNet for the 2020s Zhuang Liu Hanzi Mao Chaozheng Wu Christoph Feichtenhofer Trevor Darrell Saining Xie ViT 159 5,167 0 10 Jan 2022
Datasets: A Community Library for Natural Language Processing Quentin Lhoest Albert Villanova del Moral Yacine Jernite A. Thakur Patrick von Platen ... Thibault Goehringer Victor Mustar François Lagunas Alexander M. Rush Thomas Wolf 210 610 0 07 Sep 2021
Large-Scale Methods for Distributionally Robust Optimization Daniel Levy Y. Carmon John C. Duchi Aaron Sidford 73 217 0 12 Oct 2020
Array Programming with NumPy Charles R. Harris K. Millman S. Walt R. Gommers Pauli Virtanen ... Tyler Reddy Warren Weckesser Hameer Abbasi C. Gohlke T. Oliphant 147 14,953 0 18 Jun 2020
Better Parameter-free Stochastic Optimization with ODE Updates for Coin-Betting K. Chen John Langford Francesco Orabona 31 21 0 12 Jun 2020
Lipschitz and Comparator-Norm Adaptivity in Online Learning Zakaria Mhammedi Wouter M. Koolen 61 56 0 27 Feb 2020
Online Learning with Imperfect Hints Aditya Bhaskara Ashok Cutkosky Ravi Kumar Manish Purohit 101 58 0 11 Feb 2020
On the distance between two neural networks and the stability of learning Jeremy Bernstein Arash Vahdat Yisong Yue Xuan Li ODL 229 58 0 09 Feb 2020
A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark Xiaohua Zhai J. Puigcerver Alexander Kolesnikov P. Ruyssen C. Riquelme ... Michael Tschannen Marcin Michalski Olivier Bousquet Sylvain Gelly N. Houlsby SSL 79 439 0 01 Oct 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis Luke Zettlemoyer Veselin Stoyanov AIMat 612 24,431 0 26 Jul 2019
Unlabeled Data Improves Adversarial Robustness Y. Carmon Aditi Raghunathan Ludwig Schmidt Percy Liang John C. Duchi 121 752 0 31 May 2019
Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates Sharan Vaswani Aaron Mishkin I. Laradji Mark Schmidt Gauthier Gidel Simon Lacoste-Julien ODL 81 209 0 24 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems Alex Jinpeng Wang Yada Pruksachatkun Nikita Nangia Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 256 2,309 0 02 May 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes Yang You Jing Li Sashank J. Reddi Jonathan Hseu Sanjiv Kumar Srinadh Bhojanapalli Xiaodan Song J. Demmel Kurt Keutzer Cho-Jui Hsieh ODL 230 996 0 01 Apr 2019
SGD Converges to Global Minimum in Deep Learning via Star-convex Path Yi Zhou Junjie Yang Huishuai Zhang Yingbin Liang Vahid Tarokh 52 74 0 02 Jan 2019
Neural Network Acceptability Judgments Alex Warstadt Amanpreet Singh Samuel R. Bowman 230 1,407 0 31 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 1.1K 7,154 0 20 Apr 2018
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost Noam M. Shazeer Mitchell Stern ODL 76 1,047 0 11 Apr 2018
Shampoo: Preconditioned Stochastic Tensor Optimization Vineet Gupta Tomer Koren Y. Singer ODL 82 219 0 26 Feb 2018
Large Batch Training of Convolutional Networks Yang You Igor Gitman Boris Ginsburg ODL 128 848 0 13 Aug 2017
A Unified Approach to Adaptive Regularization in Online and Stochastic Optimization Vineet Gupta Tomer Koren Y. Singer 30 22 0 20 Jun 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference Adina Williams Nikita Nangia Samuel R. Bowman 517 4,476 0 18 Apr 2017
Remote Sensing Image Scene Classification: Benchmark and State of the Art Gong Cheng Junwei Han Xiaoqiang Lu 101 2,255 0 01 Mar 2017
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning Justin Johnson B. Hariharan Laurens van der Maaten Li Fei-Fei C. L. Zitnick Ross B. Girshick CoGe 295 2,375 0 20 Dec 2016
Densely Connected Convolutional Networks Gao Huang Zhuang Liu Laurens van der Maaten Kilian Q. Weinberger PINN 3DV 766 36,794 0 25 Aug 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar Jian Zhang Konstantin Lopyrev Percy Liang RALM 274 8,127 0 16 Jun 2016
Wide Residual Networks Sergey Zagoruyko N. Komodakis 334 7,984 0 23 May 2016
Going Deeper with Convolutions Christian Szegedy Wei Liu Yangqing Jia P. Sermanet Scott E. Reed Dragomir Anguelov D. Erhan Vincent Vanhoucke Andrew Rabinovich 457 43,649 0 17 Sep 2014
Describing Textures in the Wild Mircea Cimpoi Subhransu Maji Iasonas Kokkinos S. Mohamed Andrea Vedaldi 3DV 116 2,671 0 14 Nov 2013
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes Ohad Shamir Tong Zhang 146 574 0 08 Dec 2012
No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew J. Streeter H. B. McMahan ODL 78 89 0 09 Nov 2012
No More Pesky Learning Rates Tom Schaul Sixin Zhang Yann LeCun 137 478 0 06 Jun 2012
Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization Alekh Agarwal Peter L. Bartlett Pradeep Ravikumar Martin J. Wainwright 190 250 0 03 Sep 2010