ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.02435
  4. Cited By
You Only Compress Once: Towards Effective and Elastic BERT Compression
  via Exploit-Explore Stochastic Nature Gradient

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient

4 June 2021
Shaokun Zhang
Xiawu Zheng
Chenyi Yang
Yuchao Li
Yan Wang
Chia-Wen Lin
Mengdi Wang
Shen Li
Jun Yang
Rongrong Ji
    MQ
ArXivPDFHTML

Papers citing "You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient"

29 / 29 papers shown
Title
Learning Deep Morphological Networks with Neural Architecture Search
Learning Deep Morphological Networks with Neural Architecture Search
Yufei Hu
Nacim Belkhir
Jesús Angulo
Angela Yao
Gianni Franchi
AI4CE
32
21
0
14 Jun 2021
HAT: Hardware-Aware Transformers for Efficient Natural Language
  Processing
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Hanrui Wang
Zhanghao Wu
Zhijian Liu
Han Cai
Ligeng Zhu
Chuang Gan
Song Han
68
259
0
28 May 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
65
322
0
08 Apr 2020
BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage
  Models
BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models
Jiahui Yu
Pengchong Jin
Hanxiao Liu
Gabriel Bender
Pieter-Jan Kindermans
Mingxing Tan
Thomas Huang
Xiaodan Song
Ruoming Pang
Quoc V. Le
54
302
0
24 Mar 2020
Compressing BERT: Studying the Effects of Weight Pruning on Transfer
  Learning
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
Mitchell A. Gordon
Kevin Duh
Nicholas Andrews
VLM
37
339
0
19 Feb 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
252
199
0
07 Feb 2020
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural
  Architecture Search
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search
Daoyuan Chen
Yaliang Li
Minghui Qiu
Zhen Wang
Bofang Li
Bolin Ding
Hongbo Deng
Jun Huang
Wei Lin
Jingren Zhou
MQ
44
104
0
13 Jan 2020
Structured Pruning of a BERT-based Question Answering Model
Structured Pruning of a BERT-based Question Answering Model
J. Scott McCarley
Rishav Chakravarti
Avirup Sil
36
53
0
14 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
132
7,437
0
02 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
268
6,420
0
26 Sep 2019
Reducing Transformer Depth on Demand with Structured Dropout
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
93
588
0
25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
62
1,847
0
23 Sep 2019
Once-for-All: Train One Network and Specialize it for Efficient
  Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
Han Cai
Chuang Gan
Tianzhe Wang
Zhekai Zhang
Song Han
OOD
78
1,267
0
26 Aug 2019
Patient Knowledge Distillation for BERT Model Compression
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
101
833
0
25 Aug 2019
A Tensorized Transformer for Language Modeling
A Tensorized Transformer for Language Modeling
Xindian Ma
Peng Zhang
Shuai Zhang
Nan Duan
Yuexian Hou
D. Song
M. Zhou
42
165
0
24 Jun 2019
Are Sixteen Heads Really Better than One?
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
79
1,051
0
25 May 2019
Adaptive Stochastic Natural Gradient Method for One-Shot Neural
  Architecture Search
Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search
Youhei Akimoto
Shinichi Shirakawa
Nozomu Yoshinari
Kento Uchida
Shota Saito
K. Nishida
45
86
0
21 May 2019
Multinomial Distribution Learning for Effective Neural Architecture
  Search
Multinomial Distribution Learning for Effective Neural Architecture Search
Xiawu Zheng
Rongrong Ji
Lang Tang
Baochang Zhang
Jianzhuang Liu
Q. Tian
42
89
0
18 May 2019
End-to-End Open-Domain Question Answering with BERTserini
End-to-End Open-Domain Question Answering with BERTserini
Wei Yang
Yuqing Xie
Aileen Lin
Xingyu Li
Luchen Tan
Kun Xiong
Ming Li
Jimmy J. Lin
RALM
88
495
0
05 Feb 2019
Passage Re-ranking with BERT
Passage Re-ranking with BERT
Rodrigo Nogueira
Kyunghyun Cho
OOD
107
1,086
0
13 Jan 2019
SNAS: Stochastic Neural Architecture Search
SNAS: Stochastic Neural Architecture Search
Sirui Xie
Hehui Zheng
Chunxiao Liu
Liang Lin
59
932
0
24 Dec 2018
ProxylessNAS: Direct Neural Architecture Search on Target Task and
  Hardware
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
Han Cai
Ligeng Zhu
Song Han
78
1,865
0
02 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
966
93,936
0
11 Oct 2018
MnasNet: Platform-Aware Neural Architecture Search for Mobile
MnasNet: Platform-Aware Neural Architecture Search for Mobile
Mingxing Tan
Bo Chen
Ruoming Pang
Vijay Vasudevan
Mark Sandler
Andrew G. Howard
Quoc V. Le
MQ
102
2,995
0
31 Jul 2018
DARTS: Differentiable Architecture Search
DARTS: Differentiable Architecture Search
Hanxiao Liu
Karen Simonyan
Yiming Yang
167
4,326
0
24 Jun 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
658
7,080
0
20 Apr 2018
Dynamic Optimization of Neural Network Structures Using Probabilistic
  Modeling
Dynamic Optimization of Neural Network Structures Using Probabilistic Modeling
Shinichi Shirakawa
Yasushi Iwata
Youhei Akimoto
53
25
0
23 Jan 2018
Progressive Neural Architecture Search
Progressive Neural Architecture Search
Chenxi Liu
Barret Zoph
Maxim Neumann
Jonathon Shlens
Wei Hua
Li Li
Li Fei-Fei
Alan Yuille
Jonathan Huang
Kevin Patrick Murphy
74
1,986
0
02 Dec 2017
Neural Architecture Search with Reinforcement Learning
Neural Architecture Search with Reinforcement Learning
Barret Zoph
Quoc V. Le
383
5,362
0
05 Nov 2016
1