ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.05101
  4. Cited By
Decoupled Weight Decay Regularization

Decoupled Weight Decay Regularization

14 November 2017
I. Loshchilov
Frank Hutter
    OffRL
ArXivPDFHTML

Papers citing "Decoupled Weight Decay Regularization"

19 / 369 papers shown
Title
Demon: Improved Neural Network Training with Momentum Decay
Demon: Improved Neural Network Training with Momentum Decay
John Chen
Cameron R. Wolfe
Zhaoqi Li
Anastasios Kyrillidis
ODL
24
15
0
11 Oct 2019
Meta-Learning Deep Energy-Based Memory Models
Meta-Learning Deep Energy-Based Memory Models
Sergey Bartunov
Jack W. Rae
Simon Osindero
Timothy Lillicrap
32
34
0
07 Oct 2019
A Deep Learning Based Attack for The Chaos-based Image Encryption
A Deep Learning Based Attack for The Chaos-based Image Encryption
Chen He
Kan Ming
Yongwei Wang
Z. J. Wang
AAML
11
16
0
29 Jul 2019
Lookahead Optimizer: k steps forward, 1 step back
Lookahead Optimizer: k steps forward, 1 step back
Michael Ruogu Zhang
James Lucas
Geoffrey E. Hinton
Jimmy Ba
ODL
42
719
0
19 Jul 2019
Fetal Pose Estimation in Volumetric MRI using a 3D Convolution Neural
  Network
Fetal Pose Estimation in Volumetric MRI using a 3D Convolution Neural Network
Junshen Xu
Molin Zhang
Esra Abaci Turk
Larry Zhang
P. E. Grant
K. Ying
Polina Golland
E. Adalsteinsson
3DH
10
31
0
10 Jul 2019
S3: A Spectral-Spatial Structure Loss for Pan-Sharpening Networks
S3: A Spectral-Spatial Structure Loss for Pan-Sharpening Networks
Jae-Seok Choi
Yongwoo Kim
Munchurl Kim
8
15
0
13 Jun 2019
Latent Weights Do Not Exist: Rethinking Binarized Neural Network
  Optimization
Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization
K. Helwegen
James Widdicombe
Lukas Geiger
Zechun Liu
K. Cheng
Roeland Nusselder
MQ
27
110
0
05 Jun 2019
Stochastic Gradients for Large-Scale Tensor Decomposition
Stochastic Gradients for Large-Scale Tensor Decomposition
T. Kolda
David Hong
28
55
0
04 Jun 2019
Learning Raw Image Denoising with Bayer Pattern Unification and Bayer
  Preserving Augmentation
Learning Raw Image Denoising with Bayer Pattern Unification and Bayer Preserving Augmentation
Jiaming Liu
Chihao Wu
Yuzhi Wang
Qin Xu
Yuqian Zhou
...
Chuan Wang
Shaofan Cai
Yifan Ding
Haoqiang Fan
Jue Wang
31
68
0
29 Apr 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
28
978
0
01 Apr 2019
To Tune or Not to Tune? Adapting Pretrained Representations to Diverse
  Tasks
To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks
Matthew E. Peters
Sebastian Ruder
Noah A. Smith
24
433
0
14 Mar 2019
CIA-Net: Robust Nuclei Instance Segmentation with Contour-aware
  Information Aggregation
CIA-Net: Robust Nuclei Instance Segmentation with Contour-aware Information Aggregation
Yanning Zhou
O. F. Onder
Qi Dou
E. Tsougenis
Hao Chen
Pheng-Ann Heng
10
196
0
13 Mar 2019
Quasi-hyperbolic momentum and Adam for deep learning
Quasi-hyperbolic momentum and Adam for deep learning
Jerry Ma
Denis Yarats
ODL
84
129
0
16 Oct 2018
Closing the Generalization Gap of Adaptive Gradient Methods in Training
  Deep Neural Networks
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Jinghui Chen
Dongruo Zhou
Yiqi Tang
Ziyan Yang
Yuan Cao
Quanquan Gu
ODL
19
193
0
18 Jun 2018
Do Better ImageNet Models Transfer Better?
Do Better ImageNet Models Transfer Better?
Simon Kornblith
Jonathon Shlens
Quoc V. Le
OOD
MLT
82
1,309
0
23 May 2018
SFace: An Efficient Network for Face Detection in Large Scale Variations
SFace: An Efficient Network for Face Detection in Large Scale Variations
Jianfeng Wang
Ye Yuan
Boxun Li
Gang Yu
Sun Jian
CVBM
10
22
0
18 Apr 2018
signSGD: Compressed Optimisation for Non-Convex Problems
signSGD: Compressed Optimisation for Non-Convex Problems
Jeremy Bernstein
Yu-Xiang Wang
Kamyar Azizzadenesheli
Anima Anandkumar
FedML
ODL
44
1,020
0
13 Feb 2018
Improving Generalization Performance by Switching from Adam to SGD
Improving Generalization Performance by Switching from Adam to SGD
N. Keskar
R. Socher
ODL
32
520
0
20 Dec 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
308
2,890
0
15 Sep 2016
Previous
12345678