ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.15701
  4. Cited By
BinaryBERT: Pushing the Limit of BERT Quantization

BinaryBERT: Pushing the Limit of BERT Quantization

31 December 2020
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
    MQ
ArXivPDFHTML

Papers citing "BinaryBERT: Pushing the Limit of BERT Quantization"

50 / 54 papers shown
Title
COBRA: Algorithm-Architecture Co-optimized Binary Transformer Accelerator for Edge Inference
COBRA: Algorithm-Architecture Co-optimized Binary Transformer Accelerator for Edge Inference
Ye Qiao
Zhiheng Cheng
Yian Wang
Yifan Zhang
Yunzhe Deng
Sitao Huang
103
0
0
22 Apr 2025
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
Sifan Zhou
Shuo Wang
Zhihang Yuan
Mingjia Shi
Yuzhang Shang
Dawei Yang
MQ
ALM
115
0
0
18 Feb 2025
HadamRNN: Binary and Sparse Ternary Orthogonal RNNs
HadamRNN: Binary and Sparse Ternary Orthogonal RNNs
Armand Foucault
Franck Mamalet
François Malgouyres
MQ
116
0
0
28 Jan 2025
FlatQuant: Flatness Matters for LLM Quantization
FlatQuant: Flatness Matters for LLM Quantization
Yuxuan Sun
Ruikang Liu
Haoli Bai
Han Bao
Kang Zhao
...
Lu Hou
Chun Yuan
Xin Jiang
Wen Liu
Jun Yao
MQ
91
4
0
12 Oct 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
90
10
0
19 Aug 2024
TernaryBERT: Distillation-aware Ultra-low Bit BERT
TernaryBERT: Distillation-aware Ultra-low Bit BERT
Wei Zhang
Lu Hou
Yichun Yin
Lifeng Shang
Xiao Chen
Xin Jiang
Qun Liu
MQ
56
209
0
27 Sep 2020
HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs
HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs
H. Habi
Roy H. Jennings
Arnon Netzer
MQ
34
65
0
20 Jul 2020
BERT Loses Patience: Fast and Robust Inference with Early Exit
BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou
Canwen Xu
Tao Ge
Julian McAuley
Ke Xu
Furu Wei
24
337
0
07 Jun 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy
  Efficient Inference
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
MQ
42
186
0
08 May 2020
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
Ji Xin
Raphael Tang
Jaejun Lee
Yaoliang Yu
Jimmy J. Lin
24
370
0
27 Apr 2020
Training with Quantization Noise for Extreme Model Compression
Training with Quantization Noise for Extreme Model Compression
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Hervé Jégou
Armand Joulin
MQ
38
244
0
15 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
40
322
0
08 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
66
356
0
05 Apr 2020
Training Binary Neural Networks with Real-to-Binary Convolutions
Training Binary Neural Networks with Real-to-Binary Convolutions
Brais Martínez
Jing Yang
Adrian Bulat
Georgios Tzimiropoulos
MQ
29
227
0
25 Mar 2020
Efficient Bitwidth Search for Practical Mixed Precision Neural Network
Efficient Bitwidth Search for Practical Mixed Precision Neural Network
Yuhang Li
Wei Wang
Haoli Bai
Ruihao Gong
Xin Dong
F. Yu
MQ
21
20
0
17 Mar 2020
ReActNet: Towards Precise Binary Neural Network with Generalized
  Activation Functions
ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions
Zechun Liu
Zhiqiang Shen
Marios Savvides
Kwang-Ting Cheng
MQ
60
350
0
07 Mar 2020
BATS: Binary ArchitecTure Search
BATS: Binary ArchitecTure Search
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
MQ
52
68
0
03 Mar 2020
Learning Architectures for Binary Networks
Learning Architectures for Binary Networks
Dahyun Kim
Kunal Pratap Singh
Jonghyun Choi
MQ
35
44
0
17 Feb 2020
BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by
  Coupling Binary Activations
BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations
Hyungjun Kim
Kyungsu Kim
Jinseok Kim
Jae-Joon Kim
MQ
39
48
0
16 Feb 2020
Few Shot Network Compression via Cross Distillation
Few Shot Network Compression via Cross Distillation
Haoli Bai
Jiaxiang Wu
Irwin King
Michael Lyu
FedML
28
60
0
21 Nov 2019
Loss Aware Post-training Quantization
Loss Aware Post-training Quantization
Yury Nahshan
Brian Chmiel
Chaim Baskin
Evgenii Zheltonozhskii
Ron Banner
A. Bronstein
A. Mendelson
MQ
47
165
0
17 Nov 2019
Q8BERT: Quantized 8Bit BERT
Q8BERT: Quantized 8Bit BERT
Ofir Zafrir
Guy Boudoukh
Peter Izsak
Moshe Wasserblat
MQ
46
502
0
14 Oct 2019
Splitting Steepest Descent for Growing Neural Architectures
Splitting Steepest Descent for Growing Neural Architectures
Qiang Liu
Lemeng Wu
Dilin Wang
31
61
0
06 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
54
7,386
0
02 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
204
6,420
0
26 Sep 2019
Reducing Transformer Depth on Demand with Structured Dropout
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
76
586
0
25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
30
1,838
0
23 Sep 2019
Patient Knowledge Distillation for BERT Model Compression
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
88
833
0
25 Aug 2019
Visualizing and Understanding the Effectiveness of BERT
Visualizing and Understanding the Effectiveness of BERT
Y. Hao
Li Dong
Furu Wei
Ke Xu
48
183
0
15 Aug 2019
A Tensorized Transformer for Language Modeling
A Tensorized Transformer for Language Modeling
Xindian Ma
Peng Zhang
Shuai Zhang
Nan Duan
Yuexian Hou
D. Song
M. Zhou
27
165
0
24 Jun 2019
Are Sixteen Heads Really Better than One?
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
39
1,049
0
25 May 2019
HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
Zhen Dong
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
43
521
0
29 Apr 2019
Learned Step Size Quantization
Learned Step Size Quantization
S. K. Esser
J. McKinstry
Deepika Bablani
R. Appuswamy
D. Modha
MQ
42
792
0
21 Feb 2019
Improving Neural Network Quantization without Retraining using Outlier
  Channel Splitting
Improving Neural Network Quantization without Retraining using Outlier Channel Splitting
Ritchie Zhao
Yuwei Hu
Jordan Dotzel
Christopher De Sa
Zhiru Zhang
OODD
MQ
62
306
0
28 Jan 2019
Mixed Precision Quantization of ConvNets via Differentiable Neural
  Architecture Search
Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search
Bichen Wu
Yanghan Wang
Peizhao Zhang
Yuandong Tian
Peter Vajda
Kurt Keutzer
MQ
42
272
0
30 Nov 2018
HAQ: Hardware-Aware Automated Quantization with Mixed Precision
HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Kuan-Chieh Wang
Zhijian Liu
Chengyue Wu
Ji Lin
Song Han
MQ
82
876
0
21 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
577
93,936
0
11 Oct 2018
ProxQuant: Quantized Neural Networks via Proximal Operators
ProxQuant: Quantized Neural Networks via Proximal Operators
Yu Bai
Yu Wang
Edo Liberty
MQ
36
117
0
01 Oct 2018
Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved
  Representational Capability and Advanced Training Algorithm
Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm
Zechun Liu
Baoyuan Wu
Wenhan Luo
Xin Yang
Wen Liu
K. Cheng
MQ
62
551
0
01 Aug 2018
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep
  Neural Networks
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
Dongqing Zhang
Jiaolong Yang
Dongqiangzi Ye
G. Hua
MQ
32
701
0
26 Jul 2018
Universal Transformers
Universal Transformers
Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Lukasz Kaiser
56
752
0
10 Jul 2018
DARTS: Differentiable Architecture Search
DARTS: Differentiable Architecture Search
Hanxiao Liu
Karen Simonyan
Yiming Yang
136
4,326
0
24 Jun 2018
Know What You Don't Know: Unanswerable Questions for SQuAD
Know What You Don't Know: Unanswerable Questions for SQuAD
Pranav Rajpurkar
Robin Jia
Percy Liang
RALM
ELM
119
2,818
0
11 Jun 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
421
7,080
0
20 Apr 2018
Loss-aware Weight Quantization of Deep Networks
Loss-aware Weight Quantization of Deep Networks
Lu Hou
James T. Kwok
MQ
44
127
0
23 Feb 2018
Visualizing the Loss Landscape of Neural Nets
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
216
1,873
0
28 Dec 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
223
129,831
0
12 Jun 2017
Loss-aware Binarization of Deep Networks
Loss-aware Binarization of Deep Networks
Lu Hou
Quanming Yao
James T. Kwok
MQ
46
220
0
05 Nov 2016
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low
  Bitwidth Gradients
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Shuchang Zhou
Yuxin Wu
Zekun Ni
Xinyu Zhou
He Wen
Yuheng Zou
MQ
87
2,080
0
20 Jun 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
77
8,067
0
16 Jun 2016
12
Next