Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.13243
Cited By
A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
29 October 2018
Akhilesh Deepak Gotmare
N. Keskar
Caiming Xiong
R. Socher
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation"
49 / 49 papers shown
Title
Increasing Both Batch Size and Learning Rate Accelerates Stochastic Gradient Descent
Hikaru Umeda
Hideaki Iiduka
67
2
0
17 Feb 2025
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers
Xinyu Tang
Xiaolei Wang
Wayne Xin Zhao
Siyuan Lu
Yaliang Li
Ji-Rong Wen
LRM
56
14
0
28 Jan 2025
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
Amin Karimi Monsefi
Mengxi Zhou
Nastaran Karimi Monsefi
Ser-Nam Lim
Wei-Lun Chao
R. Ramnath
46
1
0
16 Sep 2024
Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis
Stefan Horoi
Albert Manuel Orozco Camacho
Eugene Belilovsky
Guy Wolf
FedML
MoMe
32
9
0
07 Jul 2024
CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models
Haoxiang Shi
Jiaan Wang
Jiarong Xu
Cen Wang
Tetsuya Sakai
LMTD
28
0
0
20 May 2024
Beyond MOS: Subjective Image Quality Score Preprocessing Method Based on Perceptual Similarity
Lei Wang
Desen Yuan
49
2
0
30 Apr 2024
ThermoPore: Predicting Part Porosity Based on Thermal Images Using Deep Learning
P. Pak
Francis Ogoke
Andrew Polonsky
Anthony Garland
D. Bolintineanu
Dan R. Moser
Michael J. Heiden
A. Farimani
23
4
0
23 Apr 2024
Stochastic Subnetwork Annealing: A Regularization Technique for Fine Tuning Pruned Subnetworks
Tim Whitaker
Darrell Whitley
33
0
0
16 Jan 2024
Learning with Noisy Low-Cost MOS for Image Quality Assessment via Dual-Bias Calibration
Lei Wang
Qingbo Wu
Desen Yuan
K. Ngan
Hongliang Li
Fanman Meng
Linfeng Xu
31
5
0
27 Nov 2023
MUSCLE: Multi-task Self-supervised Continual Learning to Pre-train Deep Models for X-ray Images of Multiple Body Parts
Weibin Liao
Haoyi Xiong
Qingzhong Wang
Yan Mo
Xuhong Li
Yi Liu
Zeyu Chen
Siyu Huang
Dejing Dou
CLL
14
22
0
03 Oct 2023
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
33
8
0
26 Jun 2023
Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Xuefei Wang
Yanhua Long
Yijie Li
Haoran Wei
27
4
0
20 Jun 2023
Hybrid Graph: A Unified Graph Representation with Datasets and Benchmarks for Complex Graphs
Zehui Li
Xiangyu Zhao
Mingzhu Shen
Guy-Bart Stan
Pietro Lio
Yiren Zhao
20
1
0
08 Jun 2023
Inductive biases in deep learning models for weather prediction
Jannik Thümmel
Matthias Karlbauer
S. Otte
C. Zarfl
Georg Martius
...
Thomas Scholten
Ulrich Friedrich
V. Wulfmeyer
B. Goswami
Martin Volker Butz
AI4CE
43
5
0
06 Apr 2023
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset
Thanh-Dung Le
P. Jouvet
R. Noumeir
MoE
MedIm
72
5
0
22 Mar 2023
The Multiscale Surface Vision Transformer
Simon Dahan
Logan Z. J. Williams
Daniel Rueckert
E. C. Robinson
MedIm
ViT
10
2
0
21 Mar 2023
Integrating Earth Observation Data into Causal Inference: Challenges and Opportunities
Connor Jerzak
Fredrik D. Johansson
Adel Daoud
CML
41
11
0
30 Jan 2023
Bike Frames: Understanding the Implicit Portrayal of Cyclists in the News
Xingmeng Zhao
Dan Schumacher
Sashank Nalluri
Xavier Walton
Suhana Shrestha
Anthony Rios
26
2
0
15 Jan 2023
Empirical study of the modulus as activation function in computer vision applications
Iván Vallés-Pérez
E. Soria-Olivas
M. Martínez-Sober
Antonio J. Serrano
Joan Vila-Francés
J. Gómez-Sanchís
19
15
0
15 Jan 2023
Self-Validated Physics-Embedding Network: A General Framework for Inverse Modelling
Ruiyuan Kang
D. Kyritsis
P. Liatsis
AI4CE
PINN
16
5
0
12 Oct 2022
WaveMix: A Resource-efficient Neural Network for Image Analysis
Pranav Jeevan
Kavitha Viswanathan
S. AnanduA
A. Sethi
20
20
0
28 May 2022
PSO-Convolutional Neural Networks with Heterogeneous Learning Rate
N. H. Phong
A. Santos
B. Ribeiro
24
8
0
20 May 2022
Generalized Knowledge Distillation via Relationship Matching
Han-Jia Ye
Su Lu
De-Chuan Zhan
FedML
22
20
0
04 May 2022
CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU
Zangwei Zheng
Peng Xu
Xuan Zou
Da Tang
Zhen Li
...
Xiangzhuo Ding
Fuzhao Xue
Ziheng Qing
Youlong Cheng
Yang You
VLM
44
7
0
13 Apr 2022
Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results
T. Ridnik
Hussam Lawen
Emanuel Ben-Baruch
Asaf Noy
38
11
0
07 Apr 2022
RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution
Z. Geng
Luming Liang
Tianyu Ding
Ilya Zharkov
29
69
0
27 Mar 2022
Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning
Seunghyun Lee
B. Song
19
8
0
05 Mar 2022
Raman Spectrum Matching with Contrastive Representation Learning
Bo-wen Li
Mikkel N. Schmidt
T. S. Alstrøm
28
10
0
25 Feb 2022
On the Origins of the Block Structure Phenomenon in Neural Network Representations
Thao Nguyen
M. Raghu
Simon Kornblith
25
14
0
15 Feb 2022
Exact Solutions of a Deep Linear Network
Liu Ziyin
Botao Li
Xiangmin Meng
ODL
19
21
0
10 Feb 2022
Optimal learning rate schedules in high-dimensional non-convex optimization problems
Stéphane dÁscoli
Maria Refinetti
Giulio Biroli
16
7
0
09 Feb 2022
When Do Flat Minima Optimizers Work?
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
24
58
0
01 Feb 2022
Forward Compatible Training for Large-Scale Embedding Retrieval Systems
Vivek Ramanujan
Pavan Kumar Anasosalu Vasu
Ali Farhadi
Oncel Tuzel
Hadi Pouransari
VLM
32
16
0
06 Dec 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Yuqing Wang
Minshuo Chen
T. Zhao
Molei Tao
AI4CE
57
40
0
07 Oct 2021
Boost Neural Networks by Checkpoints
Feng Wang
Gu-Yeon Wei
Qiao Liu
Jinxiang Ou
Xian Wei
Hairong Lv
FedML
UQCV
24
10
0
03 Oct 2021
Self-Supervised Feature Learning of 1D Convolutional Neural Networks with Contrastive Loss for Eating Detection Using an In-Ear Microphone
Vasileios Papapanagiotou
Christos Diou
A. Delopoulos
SSL
21
6
0
02 Aug 2021
R-Drop: Regularized Dropout for Neural Networks
Xiaobo Liang
Lijun Wu
Juntao Li
Yue Wang
Qi Meng
Tao Qin
Wei Chen
Hao Fei
Tie-Yan Liu
47
424
0
28 Jun 2021
Transportation Density Reduction Caused by City Lockdowns Across the World during the COVID-19 Epidemic: From the View of High-resolution Remote Sensing Imagery
Chen Wu
Sihan Zhu
Jiaqi Yang
Meiqi Hu
Bo Du
Lefei Zhang
Lefei Zhang
Chengxi Han
Meng Lan
21
9
0
02 Mar 2021
DeepReDuce: ReLU Reduction for Fast Private Inference
N. Jha
Zahra Ghodsi
S. Garg
Brandon Reagen
39
90
0
02 Mar 2021
An Investigation of Traffic Density Changes inside Wuhan during the COVID-19 Epidemic with GF-2 Time-Series Images
Chen Wu
Yinong Guo
Haonan Guo
J. Yuan
Lixiang Ru
Hongruixuan Chen
Bo Du
Liangpei Zhang
11
16
0
26 Jun 2020
Students Need More Attention: BERT-based AttentionModel for Small Data with Application to AutomaticPatient Message Triage
Shijing Si
Rui Wang
Jedrek Wosik
Hao Zhang
D. Dov
Guoyin Wang
Ricardo Henao
Lawrence Carin
20
24
0
22 Jun 2020
On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them
Chen Liu
Mathieu Salzmann
Tao R. Lin
Ryota Tomioka
Sabine Süsstrunk
AAML
24
81
0
15 Jun 2020
Transient Non-Stationarity and Generalisation in Deep Reinforcement Learning
Maximilian Igl
Gregory Farquhar
Jelena Luketina
Wendelin Boehmer
Shimon Whiteson
27
83
0
10 Jun 2020
Self-Distillation Amplifies Regularization in Hilbert Space
H. Mobahi
Mehrdad Farajtabar
Peter L. Bartlett
19
226
0
13 Feb 2020
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
19
168
0
19 Dec 2019
An Adaptive and Momental Bound Method for Stochastic Learning
Jianbang Ding
Xuancheng Ren
Ruixuan Luo
Xu Sun
ODL
11
46
0
27 Oct 2019
On the adequacy of untuned warmup for adaptive optimization
Jerry Ma
Denis Yarats
56
70
0
09 Oct 2019
Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
Aniruddh Raghu
M. Raghu
Samy Bengio
Oriol Vinyals
186
640
0
19 Sep 2019
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
308
2,890
0
15 Sep 2016
1