A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation

29 October 2018

Akhilesh Deepak Gotmare

Papers citing "A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation"

49 / 49 papers shown

Title
Increasing Both Batch Size and Learning Rate Accelerates Stochastic Gradient Descent Hikaru Umeda Hideaki Iiduka 67 2 0 17 Feb 2025
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers Xinyu Tang Xiaolei Wang Wayne Xin Zhao Siyuan Lu Yaliang Li Ji-Rong Wen LRM 56 14 0 28 Jan 2025
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning Amin Karimi Monsefi Mengxi Zhou Nastaran Karimi Monsefi Ser-Nam Lim Wei-Lun Chao R. Ramnath 46 1 0 16 Sep 2024
Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis Stefan Horoi Albert Manuel Orozco Camacho Eugene Belilovsky Guy Wolf FedML MoMe 32 9 0 07 Jul 2024
CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models Haoxiang Shi Jiaan Wang Jiarong Xu Cen Wang Tetsuya Sakai LMTD 28 0 0 20 May 2024
Beyond MOS: Subjective Image Quality Score Preprocessing Method Based on Perceptual Similarity Lei Wang Desen Yuan 49 2 0 30 Apr 2024
ThermoPore: Predicting Part Porosity Based on Thermal Images Using Deep Learning P. Pak Francis Ogoke Andrew Polonsky Anthony Garland D. Bolintineanu Dan R. Moser Michael J. Heiden A. Farimani 23 4 0 23 Apr 2024
Stochastic Subnetwork Annealing: A Regularization Technique for Fine Tuning Pruned Subnetworks Tim Whitaker Darrell Whitley 33 0 0 16 Jan 2024
Learning with Noisy Low-Cost MOS for Image Quality Assessment via Dual-Bias Calibration Lei Wang Qingbo Wu Desen Yuan K. Ngan Hongliang Li Fanman Meng Linfeng Xu 31 5 0 27 Nov 2023
MUSCLE: Multi-task Self-supervised Continual Learning to Pre-train Deep Models for X-ray Images of Multiple Body Parts Weibin Liao Haoyi Xiong Qingzhong Wang Yan Mo Xuhong Li Yi Liu Zeyu Chen Siyu Huang Dejing Dou CLL 14 22 0 03 Oct 2023
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning Kuan-Fu Ding Jingyang Li Kim-Chuan Toh 33 8 0 26 Jun 2023
Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition Xuefei Wang Yanhua Long Yijie Li Haoran Wei 27 4 0 20 Jun 2023
Hybrid Graph: A Unified Graph Representation with Datasets and Benchmarks for Complex Graphs Zehui Li Xiangyu Zhao Mingzhu Shen Guy-Bart Stan Pietro Lio Yiren Zhao 20 1 0 08 Jun 2023
Inductive biases in deep learning models for weather prediction Jannik Thümmel Matthias Karlbauer S. Otte C. Zarfl Georg Martius ... Thomas Scholten Ulrich Friedrich V. Wulfmeyer B. Goswami Martin Volker Butz AI4CE 43 5 0 06 Apr 2023
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset Thanh-Dung Le P. Jouvet R. Noumeir MoE MedIm 72 5 0 22 Mar 2023
The Multiscale Surface Vision Transformer Simon Dahan Logan Z. J. Williams Daniel Rueckert E. C. Robinson MedIm ViT 10 2 0 21 Mar 2023
Integrating Earth Observation Data into Causal Inference: Challenges and Opportunities Connor Jerzak Fredrik D. Johansson Adel Daoud CML 41 11 0 30 Jan 2023
Bike Frames: Understanding the Implicit Portrayal of Cyclists in the News Xingmeng Zhao Dan Schumacher Sashank Nalluri Xavier Walton Suhana Shrestha Anthony Rios 26 2 0 15 Jan 2023
Empirical study of the modulus as activation function in computer vision applications Iván Vallés-Pérez E. Soria-Olivas M. Martínez-Sober Antonio J. Serrano Joan Vila-Francés J. Gómez-Sanchís 19 15 0 15 Jan 2023
Self-Validated Physics-Embedding Network: A General Framework for Inverse Modelling Ruiyuan Kang D. Kyritsis P. Liatsis AI4CE PINN 16 5 0 12 Oct 2022
WaveMix: A Resource-efficient Neural Network for Image Analysis Pranav Jeevan Kavitha Viswanathan S. AnanduA A. Sethi 20 20 0 28 May 2022
PSO-Convolutional Neural Networks with Heterogeneous Learning Rate N. H. Phong A. Santos B. Ribeiro 24 8 0 20 May 2022
Generalized Knowledge Distillation via Relationship Matching Han-Jia Ye Su Lu De-Chuan Zhan FedML 22 20 0 04 May 2022
CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU Zangwei Zheng Peng Xu Xuan Zou Da Tang Zhen Li ... Xiangzhuo Ding Fuzhao Xue Ziheng Qing Youlong Cheng Yang You VLM 44 7 0 13 Apr 2022
Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results T. Ridnik Hussam Lawen Emanuel Ben-Baruch Asaf Noy 38 11 0 07 Apr 2022
RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution Z. Geng Luming Liang Tianyu Ding Ilya Zharkov 29 69 0 27 Mar 2022
Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning Seunghyun Lee B. Song 19 8 0 05 Mar 2022
Raman Spectrum Matching with Contrastive Representation Learning Bo-wen Li Mikkel N. Schmidt T. S. Alstrøm 28 10 0 25 Feb 2022
On the Origins of the Block Structure Phenomenon in Neural Network Representations Thao Nguyen M. Raghu Simon Kornblith 25 14 0 15 Feb 2022
Exact Solutions of a Deep Linear Network Liu Ziyin Botao Li Xiangmin Meng ODL 19 21 0 10 Feb 2022
Optimal learning rate schedules in high-dimensional non-convex optimization problems Stéphane dÁscoli Maria Refinetti Giulio Biroli 16 7 0 09 Feb 2022
When Do Flat Minima Optimizers Work? Jean Kaddour Linqing Liu Ricardo M. A. Silva Matt J. Kusner ODL 24 58 0 01 Feb 2022
Forward Compatible Training for Large-Scale Embedding Retrieval Systems Vivek Ramanujan Pavan Kumar Anasosalu Vasu Ali Farhadi Oncel Tuzel Hadi Pouransari VLM 32 16 0 06 Dec 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect Yuqing Wang Minshuo Chen T. Zhao Molei Tao AI4CE 57 40 0 07 Oct 2021
Boost Neural Networks by Checkpoints Feng Wang Gu-Yeon Wei Qiao Liu Jinxiang Ou Xian Wei Hairong Lv FedML UQCV 24 10 0 03 Oct 2021
Self-Supervised Feature Learning of 1D Convolutional Neural Networks with Contrastive Loss for Eating Detection Using an In-Ear Microphone Vasileios Papapanagiotou Christos Diou A. Delopoulos SSL 21 6 0 02 Aug 2021
R-Drop: Regularized Dropout for Neural Networks Xiaobo Liang Lijun Wu Juntao Li Yue Wang Qi Meng Tao Qin Wei Chen Hao Fei Tie-Yan Liu 47 424 0 28 Jun 2021
Transportation Density Reduction Caused by City Lockdowns Across the World during the COVID-19 Epidemic: From the View of High-resolution Remote Sensing Imagery Chen Wu Sihan Zhu Jiaqi Yang Meiqi Hu Bo Du Lefei Zhang Lefei Zhang Chengxi Han Meng Lan 21 9 0 02 Mar 2021
DeepReDuce: ReLU Reduction for Fast Private Inference N. Jha Zahra Ghodsi S. Garg Brandon Reagen 39 90 0 02 Mar 2021
An Investigation of Traffic Density Changes inside Wuhan during the COVID-19 Epidemic with GF-2 Time-Series Images Chen Wu Yinong Guo Haonan Guo J. Yuan Lixiang Ru Hongruixuan Chen Bo Du Liangpei Zhang 11 16 0 26 Jun 2020
Students Need More Attention: BERT-based AttentionModel for Small Data with Application to AutomaticPatient Message Triage Shijing Si Rui Wang Jedrek Wosik Hao Zhang D. Dov Guoyin Wang Ricardo Henao Lawrence Carin 20 24 0 22 Jun 2020
On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them Chen Liu Mathieu Salzmann Tao R. Lin Ryota Tomioka Sabine Süsstrunk AAML 24 81 0 15 Jun 2020
Transient Non-Stationarity and Generalisation in Deep Reinforcement Learning Maximilian Igl Gregory Farquhar Jelena Luketina Wendelin Boehmer Shimon Whiteson 27 83 0 10 Jun 2020
Self-Distillation Amplifies Regularization in Hilbert Space H. Mobahi Mehrdad Farajtabar Peter L. Bartlett 19 226 0 13 Feb 2020
Optimization for deep learning: theory and algorithms Ruoyu Sun ODL 19 168 0 19 Dec 2019
An Adaptive and Momental Bound Method for Stochastic Learning Jianbang Ding Xuancheng Ren Ruixuan Luo Xu Sun ODL 11 46 0 27 Oct 2019
On the adequacy of untuned warmup for adaptive optimization Jerry Ma Denis Yarats 56 70 0 09 Oct 2019
Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML Aniruddh Raghu M. Raghu Samy Bengio Oriol Vinyals 186 640 0 19 Sep 2019
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 308 2,890 0 15 Sep 2016