Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2008.06808
Cited By
Finding Fast Transformers: One-Shot Neural Architecture Search by Component Composition
15 August 2020
Henry Tsai
Jayden Ooi
Chun-Sung Ferng
Hyung Won Chung
Jason Riesa
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Finding Fast Transformers: One-Shot Neural Architecture Search by Component Composition"
11 / 11 papers shown
Title
Training with Quantization Noise for Extreme Model Compression
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Hervé Jégou
Armand Joulin
MQ
68
245
0
15 Apr 2020
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
Junjie Hu
Sebastian Ruder
Aditya Siddhant
Graham Neubig
Orhan Firat
Melvin Johnson
ELM
156
970
0
24 Mar 2020
PowerNorm: Rethinking Batch Normalization in Transformers
Sheng Shen
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
BDL
73
16
0
17 Mar 2020
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
314
6,441
0
26 Sep 2019
Small and Practical BERT Models for Sequence Labeling
Henry Tsai
Jason Riesa
Melvin Johnson
N. Arivazhagan
Xin Li
Amelia Archer
VLM
49
121
0
31 Aug 2019
BAM! Born-Again Multi-Task Networks for Natural Language Understanding
Kevin Clark
Minh-Thang Luong
Urvashi Khandelwal
Christopher D. Manning
Quoc V. Le
53
229
0
10 Jul 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
200
993
0
01 Apr 2019
SNAS: Stochastic Neural Architecture Search
Sirui Xie
Hehui Zheng
Chunxiao Liu
Liang Lin
80
935
0
24 Dec 2018
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Taku Kudo
John Richardson
175
3,514
0
19 Aug 2018
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
MQ
136
3,111
0
15 Dec 2017
To prune, or not to prune: exploring the efficacy of pruning for model compression
Michael Zhu
Suyog Gupta
163
1,273
0
05 Oct 2017
1