Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.12694
Cited By
v1
v2 (latest)
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
25 May 2022
Clara Na
Sanket Vaibhav Mehta
Emma Strubell
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models"
50 / 66 papers shown
Title
Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?
Mansheej Paul
F. Chen
Brett W. Larsen
Jonathan Frankle
Surya Ganguli
Gintare Karolina Dziugaite
UQCV
101
38
0
06 Oct 2022
Measuring the Carbon Intensity of AI in Cloud Instances
Jesse Dodge
Taylor Prewitt
Rémi Tachet des Combes
Erika Odmark
Roy Schwartz
Emma Strubell
A. Luccioni
Noah A. Smith
Nicole DeCario
Will Buchanan
71
192
0
10 Jun 2022
Sharpness-Aware Training for Free
Jiawei Du
Daquan Zhou
Jiashi Feng
Vincent Y. F. Tan
Qiufeng Wang
AAML
81
96
0
27 May 2022
Structured Pruning Learns Compact and Accurate Models
Mengzhou Xia
Zexuan Zhong
Danqi Chen
VLM
61
186
0
01 Apr 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape
Devansh Bisla
Jing Wang
A. Choromańska
77
37
0
20 Jan 2022
An Empirical Investigation of the Role of Pre-training in Lifelong Learning
Sanket Vaibhav Mehta
Darshan Patil
Sarath Chandar
Emma Strubell
CLL
116
144
0
16 Dec 2021
Sharpness-Aware Minimization Improves Language Model Generalization
Dara Bahri
H. Mobahi
Yi Tay
156
103
0
16 Oct 2021
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks
Jiawei Du
Hanshu Yan
Jiashi Feng
Qiufeng Wang
Liangli Zhen
Rick Siow Mong Goh
Vincent Y. F. Tan
AAML
147
135
0
07 Oct 2021
The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation
Orevaoghene Ahia
Julia Kreutzer
Sara Hooker
164
55
0
06 Oct 2021
Understanding and Overcoming the Challenges of Efficient Transformer Quantization
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
71
143
0
27 Sep 2021
Block Pruning For Faster Transformers
François Lagunas
Ella Charlaix
Victor Sanh
Alexander M. Rush
VLM
61
223
0
10 Sep 2021
Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression
Canwen Xu
Wangchunshu Zhou
Tao Ge
Kelvin J. Xu
Julian McAuley
Furu Wei
53
42
0
07 Sep 2021
Label Noise SGD Provably Prefers Flat Global Minimizers
Alexandru Damian
Tengyu Ma
Jason D. Lee
NoLa
107
120
0
11 Jun 2021
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization
Chen Liang
Simiao Zuo
Minshuo Chen
Haoming Jiang
Xiaodong Liu
Pengcheng He
T. Zhao
Weizhu Chen
45
69
0
25 May 2021
Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network
James Diffenderfer
B. Kailkhura
MQ
69
76
0
17 Mar 2021
Robustness to Pruning Predicts Generalization in Deep Neural Networks
Lorenz Kuhn
Clare Lyle
Aidan Gomez
Jonas Rothfuss
Y. Gal
83
14
0
10 Mar 2021
Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy
Lucas Liebenwein
Cenk Baykal
Brandon Carter
David K Gifford
Daniela Rus
AAML
55
73
0
04 Mar 2021
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks
Jungmin Kwon
Jeongseop Kim
Hyunseong Park
I. Choi
98
290
0
23 Feb 2021
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
158
351
0
05 Jan 2021
EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets
Xiaohan Chen
Yu Cheng
Shuohang Wang
Zhe Gan
Zhangyang Wang
Jingjing Liu
81
100
0
31 Dec 2020
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
209
227
0
31 Dec 2020
Parameter-Efficient Transfer Learning with Diff Pruning
Demi Guo
Alexander M. Rush
Yoon Kim
82
405
0
14 Dec 2020
The De-democratization of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research
N. Ahmed
Muntasir Wahed
69
111
0
22 Oct 2020
Characterising Bias in Compressed Models
Sara Hooker
Nyalleng Moorosi
Gregory Clark
Samy Bengio
Emily L. Denton
67
185
0
06 Oct 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
192
1,350
0
03 Oct 2020
What is being transferred in transfer learning?
Behnam Neyshabur
Hanie Sedghi
Chiyuan Zhang
106
527
0
26 Aug 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
823
42,332
0
28 May 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
71
486
0
15 May 2020
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
Marco Tulio Ribeiro
Tongshuang Wu
Carlos Guestrin
Sameer Singh
ELM
208
1,107
0
08 May 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
109
817
0
06 Apr 2020
What is the State of Neural Network Pruning?
Davis W. Blalock
Jose Javier Gonzalez Ortiz
Jonathan Frankle
John Guttag
267
1,052
0
06 Mar 2020
Comparing Rewinding and Fine-tuning in Neural Network Pruning
Alex Renda
Jonathan Frankle
Michael Carbin
275
388
0
05 Mar 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
102
151
0
26 Feb 2020
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Jonathan Frankle
Gintare Karolina Dziugaite
Daniel M. Roy
Michael Carbin
MoMe
156
619
0
11 Dec 2019
Fantastic Generalization Measures and Where to Find Them
Yiding Jiang
Behnam Neyshabur
H. Mobahi
Dilip Krishnan
Samy Bengio
AI4CE
139
610
0
04 Dec 2019
Understanding Knowledge Distillation in Non-autoregressive Machine Translation
Chunting Zhou
Graham Neubig
Jiatao Gu
64
221
0
07 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
445
20,298
0
23 Oct 2019
Q8BERT: Quantized 8Bit BERT
Ofir Zafrir
Guy Boudoukh
Peter Izsak
Moshe Wasserblat
MQ
81
505
0
14 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
234
7,520
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
109
1,860
0
23 Sep 2019
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
134
843
0
25 Aug 2019
Visualizing and Understanding the Effectiveness of BERT
Y. Hao
Li Dong
Furu Wei
Ke Xu
138
185
0
15 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
665
24,528
0
26 Jul 2019
The Generalization-Stability Tradeoff In Neural Network Pruning
Brian Bartoldson
Ari S. Morcos
Adrian Barbu
G. Erlebacher
77
76
0
09 Jun 2019
Energy and Policy Considerations for Deep Learning in NLP
Emma Strubell
Ananya Ganesh
Andrew McCallum
73
2,660
0
05 Jun 2019
Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model
Aishwarya Bhandare
Vamsi Sripathi
Deepthi Karkada
Vivek V. Menon
Sun Choi
Kushal Datta
V. Saletore
MQ
69
132
0
03 Jun 2019
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
103
1,062
0
25 May 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,114
0
11 Oct 2018
Neural Network Acceptability Judgments
Alex Warstadt
Amanpreet Singh
Samuel R. Bowman
233
1,411
0
31 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,182
0
20 Apr 2018
1
2
Next