Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.01847
Cited By
PoF: Post-Training of Feature Extractor for Improving Generalization
5 July 2022
Ikuro Sato
Ryota Yamada
Masayuki Tanaka
Nakamasa Inoue
Rei Kawakami
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PoF: Post-Training of Feature Extractor for Improving Generalization"
29 / 29 papers shown
Title
KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks
J. G. Pauloski
Qi Huang
Lei Huang
Shivaram Venkataraman
Kyle Chard
Ian Foster
Zhao-jie Zhang
52
29
0
04 Jul 2021
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks
Jungmin Kwon
Jeongseop Kim
Hyunseong Park
I. Choi
86
290
0
23 Feb 2021
Regularizing Neural Networks via Adversarial Model Perturbation
Yaowei Zheng
Richong Zhang
Yongyi Mao
AAML
56
99
0
10 Oct 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
184
1,349
0
03 Oct 2020
Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited
Wesley J. Maddox
Gregory W. Benton
A. Wilson
109
61
0
04 Mar 2020
Fantastic Generalization Measures and Where to Find Them
Yiding Jiang
Behnam Neyshabur
H. Mobahi
Dilip Krishnan
Samy Bengio
AI4CE
129
606
0
04 Dec 2019
Breaking Inter-Layer Co-Adaptation by Classifier Anonymization
Ikuro Sato
Kohta Ishikawa
Guoqing Liu
Masayuki Tanaka
35
7
0
04 Jun 2019
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Jinghui Chen
Dongruo Zhou
Yiqi Tang
Ziyan Yang
Yuan Cao
Quanquan Gu
ODL
72
193
0
18 Jun 2018
Averaging Weights Leads to Wider Optima and Better Generalization
Pavel Izmailov
Dmitrii Podoprikhin
T. Garipov
Dmitry Vetrov
A. Wilson
FedML
MoMe
121
1,659
0
14 Mar 2018
Energy-entropy competition and the effectiveness of stochastic gradient descent in machine learning
Yao Zhang
Andrew M. Saxe
Madhu S. Advani
A. Lee
52
60
0
05 Mar 2018
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
Z. Yao
A. Gholami
Qi Lei
Kurt Keutzer
Michael W. Mahoney
61
167
0
22 Feb 2018
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
243
1,890
0
28 Dec 2017
Improving Generalization Performance by Switching from Adam to SGD
N. Keskar
R. Socher
ODL
88
523
0
20 Dec 2017
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks
Pratik Chaudhari
Stefano Soatto
MLT
68
304
0
30 Oct 2017
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Han Xiao
Kashif Rasul
Roland Vollgraf
280
8,878
0
25 Aug 2017
Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes
Lei Wu
Zhanxing Zhu
E. Weinan
ODL
62
221
0
30 Jun 2017
Practical Gauss-Newton Optimisation for Deep Learning
Aleksandar Botev
H. Ritter
David Barber
ODL
49
231
0
12 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
126
3,678
0
08 Jun 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
58
1,030
0
23 May 2017
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
Gintare Karolina Dziugaite
Daniel M. Roy
106
813
0
31 Mar 2017
Sharp Minima Can Generalize For Deep Nets
Laurent Dinh
Razvan Pascanu
Samy Bengio
Yoshua Bengio
ODL
112
772
0
15 Mar 2017
Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond
Levent Sagun
Léon Bottou
Yann LeCun
UQCV
84
236
0
22 Nov 2016
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Pratik Chaudhari
A. Choromańska
Stefano Soatto
Yann LeCun
Carlo Baldassi
C. Borgs
J. Chayes
Levent Sagun
R. Zecchina
ODL
96
773
0
06 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
421
2,937
0
15 Sep 2016
Wide Residual Networks
Sergey Zagoruyko
N. Komodakis
334
7,984
0
23 May 2016
A Kronecker-factored approximate Fisher matrix for convolution layers
Roger C. Grosse
James Martens
ODL
105
263
0
03 Feb 2016
Optimizing Neural Networks with Kronecker-factored Approximate Curvature
James Martens
Roger C. Grosse
ODL
101
1,013
0
19 Mar 2015
Qualitatively characterizing neural network optimization problems
Ian Goodfellow
Oriol Vinyals
Andrew M. Saxe
ODL
108
522
0
19 Dec 2014
Improving neural networks by preventing co-adaptation of feature detectors
Geoffrey E. Hinton
Nitish Srivastava
A. Krizhevsky
Ilya Sutskever
Ruslan Salakhutdinov
VLM
450
7,661
0
03 Jul 2012
1