Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.15701
Cited By
BinaryBERT: Pushing the Limit of BERT Quantization
31 December 2020
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BinaryBERT: Pushing the Limit of BERT Quantization"
50 / 54 papers shown
Title
COBRA: Algorithm-Architecture Co-optimized Binary Transformer Accelerator for Edge Inference
Ye Qiao
Zhiheng Cheng
Yian Wang
Yifan Zhang
Yunzhe Deng
Sitao Huang
103
0
0
22 Apr 2025
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
Sifan Zhou
Shuo Wang
Zhihang Yuan
Mingjia Shi
Yuzhang Shang
Dawei Yang
MQ
ALM
115
0
0
18 Feb 2025
HadamRNN: Binary and Sparse Ternary Orthogonal RNNs
Armand Foucault
Franck Mamalet
François Malgouyres
MQ
116
0
0
28 Jan 2025
FlatQuant: Flatness Matters for LLM Quantization
Yuxuan Sun
Ruikang Liu
Haoli Bai
Han Bao
Kang Zhao
...
Lu Hou
Chun Yuan
Xin Jiang
Wen Liu
Jun Yao
MQ
91
4
0
12 Oct 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
90
10
0
19 Aug 2024
TernaryBERT: Distillation-aware Ultra-low Bit BERT
Wei Zhang
Lu Hou
Yichun Yin
Lifeng Shang
Xiao Chen
Xin Jiang
Qun Liu
MQ
56
209
0
27 Sep 2020
HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs
H. Habi
Roy H. Jennings
Arnon Netzer
MQ
34
65
0
20 Jul 2020
BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou
Canwen Xu
Tao Ge
Julian McAuley
Ke Xu
Furu Wei
24
337
0
07 Jun 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
MQ
42
186
0
08 May 2020
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
Ji Xin
Raphael Tang
Jaejun Lee
Yaoliang Yu
Jimmy J. Lin
24
370
0
27 Apr 2020
Training with Quantization Noise for Extreme Model Compression
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Hervé Jégou
Armand Joulin
MQ
38
244
0
15 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
40
322
0
08 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
66
356
0
05 Apr 2020
Training Binary Neural Networks with Real-to-Binary Convolutions
Brais Martínez
Jing Yang
Adrian Bulat
Georgios Tzimiropoulos
MQ
29
227
0
25 Mar 2020
Efficient Bitwidth Search for Practical Mixed Precision Neural Network
Yuhang Li
Wei Wang
Haoli Bai
Ruihao Gong
Xin Dong
F. Yu
MQ
21
20
0
17 Mar 2020
ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions
Zechun Liu
Zhiqiang Shen
Marios Savvides
Kwang-Ting Cheng
MQ
60
350
0
07 Mar 2020
BATS: Binary ArchitecTure Search
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
MQ
52
68
0
03 Mar 2020
Learning Architectures for Binary Networks
Dahyun Kim
Kunal Pratap Singh
Jonghyun Choi
MQ
35
44
0
17 Feb 2020
BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations
Hyungjun Kim
Kyungsu Kim
Jinseok Kim
Jae-Joon Kim
MQ
39
48
0
16 Feb 2020
Few Shot Network Compression via Cross Distillation
Haoli Bai
Jiaxiang Wu
Irwin King
Michael Lyu
FedML
28
60
0
21 Nov 2019
Loss Aware Post-training Quantization
Yury Nahshan
Brian Chmiel
Chaim Baskin
Evgenii Zheltonozhskii
Ron Banner
A. Bronstein
A. Mendelson
MQ
47
165
0
17 Nov 2019
Q8BERT: Quantized 8Bit BERT
Ofir Zafrir
Guy Boudoukh
Peter Izsak
Moshe Wasserblat
MQ
46
502
0
14 Oct 2019
Splitting Steepest Descent for Growing Neural Architectures
Qiang Liu
Lemeng Wu
Dilin Wang
31
61
0
06 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
54
7,386
0
02 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
204
6,420
0
26 Sep 2019
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
76
586
0
25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
30
1,838
0
23 Sep 2019
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
88
833
0
25 Aug 2019
Visualizing and Understanding the Effectiveness of BERT
Y. Hao
Li Dong
Furu Wei
Ke Xu
48
183
0
15 Aug 2019
A Tensorized Transformer for Language Modeling
Xindian Ma
Peng Zhang
Shuai Zhang
Nan Duan
Yuexian Hou
D. Song
M. Zhou
27
165
0
24 Jun 2019
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
39
1,049
0
25 May 2019
HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
Zhen Dong
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
43
521
0
29 Apr 2019
Learned Step Size Quantization
S. K. Esser
J. McKinstry
Deepika Bablani
R. Appuswamy
D. Modha
MQ
42
792
0
21 Feb 2019
Improving Neural Network Quantization without Retraining using Outlier Channel Splitting
Ritchie Zhao
Yuwei Hu
Jordan Dotzel
Christopher De Sa
Zhiru Zhang
OODD
MQ
62
306
0
28 Jan 2019
Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search
Bichen Wu
Yanghan Wang
Peizhao Zhang
Yuandong Tian
Peter Vajda
Kurt Keutzer
MQ
42
272
0
30 Nov 2018
HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Kuan-Chieh Wang
Zhijian Liu
Chengyue Wu
Ji Lin
Song Han
MQ
82
876
0
21 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
577
93,936
0
11 Oct 2018
ProxQuant: Quantized Neural Networks via Proximal Operators
Yu Bai
Yu Wang
Edo Liberty
MQ
36
117
0
01 Oct 2018
Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm
Zechun Liu
Baoyuan Wu
Wenhan Luo
Xin Yang
Wen Liu
K. Cheng
MQ
62
551
0
01 Aug 2018
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
Dongqing Zhang
Jiaolong Yang
Dongqiangzi Ye
G. Hua
MQ
32
701
0
26 Jul 2018
Universal Transformers
Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Lukasz Kaiser
56
752
0
10 Jul 2018
DARTS: Differentiable Architecture Search
Hanxiao Liu
Karen Simonyan
Yiming Yang
136
4,326
0
24 Jun 2018
Know What You Don't Know: Unanswerable Questions for SQuAD
Pranav Rajpurkar
Robin Jia
Percy Liang
RALM
ELM
119
2,818
0
11 Jun 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
421
7,080
0
20 Apr 2018
Loss-aware Weight Quantization of Deep Networks
Lu Hou
James T. Kwok
MQ
44
127
0
23 Feb 2018
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
216
1,873
0
28 Dec 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
223
129,831
0
12 Jun 2017
Loss-aware Binarization of Deep Networks
Lu Hou
Quanming Yao
James T. Kwok
MQ
46
220
0
05 Nov 2016
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Shuchang Zhou
Yuxin Wu
Zekun Ni
Xinyu Zhou
He Wen
Yuheng Zou
MQ
87
2,080
0
20 Jun 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
77
8,067
0
16 Jun 2016
1
2
Next