Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2003.03977
Cited By
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
9 March 2020
Nikhil Iyer
V. Thejas
Nipun Kwatra
Ramachandran Ramjee
Muthian Sivathanu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule"
10 / 10 papers shown
Title
Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation
Jingxuan Wei
Linzhuang Sun
Yichong Leng
Xu Tan
Bihui Yu
Ruifeng Guo
53
3
0
23 Apr 2024
Large Learning Rates Improve Generalization: But How Large Are We Talking About?
E. Lobacheva
Eduard Pockonechnyy
M. Kodryan
Dmitry Vetrov
AI4CE
16
0
0
19 Nov 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
32
41
0
12 Jul 2023
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
29
11
0
20 Sep 2022
Distance Learner: Incorporating Manifold Prior to Model Training
Aditya Chetan
Nipun Kwatra
21
1
0
14 Jul 2022
Efficient Multi-Purpose Cross-Attention Based Image Alignment Block for Edge Devices
Bahri Batuhan Bilecen
Alparslan Fisne
Mustafa Ayazoglu
22
2
0
01 Jun 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
90
99
0
13 Oct 2021
Ranger21: a synergistic deep learning optimizer
Less Wright
Nestor Demeure
ODL
AI4CE
30
86
0
25 Jun 2021
A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay
L. Smith
208
1,020
0
26 Mar 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
312
2,896
0
15 Sep 2016
1