Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1902.01996
Cited By
Are All Layers Created Equal?
6 February 2019
Chiyuan Zhang
Samy Bengio
Y. Singer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Are All Layers Created Equal?"
39 / 39 papers shown
Title
Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives
Pierre Wolinski
ODL
29
0
0
06 Dec 2023
Reset It and Forget It: Relearning Last-Layer Weights Improves Continual and Transfer Learning
Lapo Frati
Neil Traft
Jeff Clune
Nick Cheney
CLL
24
0
0
12 Oct 2023
Iterative Magnitude Pruning as a Renormalisation Group: A Study in The Context of The Lottery Ticket Hypothesis
Abu-Al Hassan
33
0
0
06 Aug 2023
Layer-wise Linear Mode Connectivity
Linara Adilova
Maksym Andriushchenko
Michael Kamp
Asja Fischer
Martin Jaggi
FedML
FAtt
MoMe
33
15
0
13 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
20
41
0
12 Jul 2023
Understanding plasticity in neural networks
Clare Lyle
Zeyu Zheng
Evgenii Nikishin
Bernardo Avila-Pires
Razvan Pascanu
Will Dabney
AI4CE
35
97
0
02 Mar 2023
On the Lipschitz Constant of Deep Networks and Double Descent
Matteo Gamba
Hossein Azizpour
Marten Bjorkman
31
7
0
28 Jan 2023
Leveraging Unlabeled Data to Track Memorization
Mahsa Forouzesh
Hanie Sedghi
Patrick Thiran
NoLa
TDI
34
4
0
08 Dec 2022
Learning to Generate Image Embeddings with User-level Differential Privacy
Zheng Xu
Maxwell D. Collins
Yuxiao Wang
Liviu Panait
Sewoong Oh
S. Augenstein
Ting Liu
Florian Schroff
H. B. McMahan
FedML
30
29
0
20 Nov 2022
A Law of Data Separation in Deep Learning
Hangfeng He
Weijie J. Su
OOD
24
36
0
31 Oct 2022
Surgical Fine-Tuning Improves Adaptation to Distribution Shifts
Yoonho Lee
Annie S. Chen
Fahim Tajwar
Ananya Kumar
Huaxiu Yao
Percy Liang
Chelsea Finn
OOD
58
197
0
20 Oct 2022
Git Re-Basin: Merging Models modulo Permutation Symmetries
Samuel K. Ainsworth
J. Hayase
S. Srinivasa
MoMe
255
314
0
11 Sep 2022
Can pruning improve certified robustness of neural networks?
Zhangheng Li
Tianlong Chen
Linyi Li
Bo-wen Li
Zhangyang Wang
AAML
11
11
0
15 Jun 2022
A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning
Da-Wei Zhou
Qiwen Wang
Han-Jia Ye
De-Chuan Zhan
29
122
0
26 May 2022
The Primacy Bias in Deep Reinforcement Learning
Evgenii Nikishin
Max Schwarzer
P. DÓro
Pierre-Luc Bacon
Aaron C. Courville
OnRL
96
180
0
16 May 2022
Token Dropping for Efficient BERT Pretraining
Le Hou
Richard Yuanzhe Pang
Dinesh Manocha
Yuexin Wu
Xinying Song
Xiaodan Song
Denny Zhou
22
42
0
24 Mar 2022
FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs with Dynamic Fixed-Point Representation
Ahmad Shawahna
S. M. Sait
A. El-Maleh
Irfan Ahmad
MQ
20
6
0
22 Mar 2022
What Makes Transfer Learning Work For Medical Images: Feature Reuse & Other Factors
Christos Matsoukas
Johan Fredin Haslum
Moein Sorkhei
Magnus P Soderberg
Kevin Smith
VLM
OOD
MedIm
27
85
0
02 Mar 2022
DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization
Aviral Kumar
Rishabh Agarwal
Tengyu Ma
Aaron Courville
George Tucker
Sergey Levine
OffRL
31
65
0
09 Dec 2021
Compare Where It Matters: Using Layer-Wise Regularization To Improve Federated Learning on Heterogeneous Data
Ha Min Son
M. Kim
T. Chung
FedML
19
9
0
01 Dec 2021
Visualizing the Emergence of Intermediate Visual Patterns in DNNs
Mingjie Li
Shaobo Wang
Quanshi Zhang
27
11
0
05 Nov 2021
Partial Variable Training for Efficient On-Device Federated Learning
Tien-Ju Yang
Dhruv Guliani
F. Beaufays
Giovanni Motta
FedML
24
25
0
11 Oct 2021
Exploring Heterogeneous Characteristics of Layers in ASR Models for More Efficient Training
Lillian Zhou
Dhruv Guliani
Andreas Kabel
Giovanni Motta
F. Beaufays
23
1
0
08 Oct 2021
Enabling On-Device Training of Speech Recognition Models with Federated Dropout
Dhruv Guliani
Lillian Zhou
Changwan Ryu
Tien-Ju Yang
Harry Zhang
Yong Xiao
F. Beaufays
Giovanni Motta
FedML
33
16
0
07 Oct 2021
Efficient and Private Federated Learning with Partially Trainable Networks
Hakim Sidahmed
Zheng Xu
Ankush Garg
Yuan Cao
Mingqing Chen
FedML
49
13
0
06 Oct 2021
What can linear interpolation of neural network loss landscapes tell us?
Tiffany J. Vlaar
Jonathan Frankle
MoMe
30
27
0
30 Jun 2021
Randomness In Neural Network Training: Characterizing The Impact of Tooling
Donglin Zhuang
Xingyao Zhang
Shuaiwen Leon Song
Sara Hooker
25
75
0
22 Jun 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
274
2,603
0
04 May 2021
Experiments with Rich Regime Training for Deep Learning
Xinyan Li
A. Banerjee
32
2
0
26 Feb 2021
Reservoir Transformers
Sheng Shen
Alexei Baevski
Ari S. Morcos
Kurt Keutzer
Michael Auli
Douwe Kiela
35
17
0
30 Dec 2020
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
Y. Fu
Haoran You
Yang Katie Zhao
Yue Wang
Chaojian Li
K. Gopalakrishnan
Zhangyang Wang
Yingyan Lin
MQ
38
32
0
24 Dec 2020
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization
Adepu Ravi Sankar
Yash Khasbage
Rahul Vigneswaran
V. Balasubramanian
25
42
0
07 Dec 2020
Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics
V. Ramasesh
Ethan Dyer
M. Raghu
CLL
24
173
0
14 Jul 2020
HourNAS: Extremely Fast Neural Architecture Search Through an Hourglass Lens
Zhaohui Yang
Yunhe Wang
Xinghao Chen
Jianyuan Guo
Wei Zhang
Chao Xu
Chunjing Xu
Dacheng Tao
Chang Xu
34
17
0
29 May 2020
Predicting Neural Network Accuracy from Weights
Thomas Unterthiner
Daniel Keysers
Sylvain Gelly
Olivier Bousquet
Ilya O. Tolstikhin
30
101
0
26 Feb 2020
Layerwise Noise Maximisation to Train Low-Energy Deep Neural Networks
Sébastien Henwood
François Leduc-Primeau
Yvon Savaria
28
10
0
23 Dec 2019
Fast Hardware-Aware Neural Architecture Search
Li Lyna Zhang
Yuqing Yang
Yuhang Jiang
Wenwu Zhu
Yunxin Liu
3DV
28
0
0
25 Oct 2019
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
308
2,890
0
15 Sep 2016
Benefits of depth in neural networks
Matus Telgarsky
148
602
0
14 Feb 2016
1