ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.06188
  4. Cited By
Q8BERT: Quantized 8Bit BERT

Q8BERT: Quantized 8Bit BERT

14 October 2019
Ofir Zafrir
Guy Boudoukh
Peter Izsak
Moshe Wasserblat
    MQ
ArXivPDFHTML

Papers citing "Q8BERT: Quantized 8Bit BERT"

50 / 304 papers shown
Title
Distilling Large Language Models into Tiny and Effective Students using
  pQRNN
Distilling Large Language Models into Tiny and Effective Students using pQRNN
P. Kaliamoorthi
Aditya Siddhant
Edward Li
Melvin Johnson
MQ
21
17
0
21 Jan 2021
KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with
  Learned Step Size Quantization
KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with Learned Step Size Quantization
Jing Jin
Cai Liang
Tiancheng Wu
Li Zou
Zhiliang Gan
MQ
19
26
0
15 Jan 2021
I-BERT: Integer-only BERT Quantization
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
107
345
0
05 Jan 2021
Subformer: Exploring Weight Sharing for Parameter Efficiency in
  Generative Transformers
Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers
Machel Reid
Edison Marrese-Taylor
Y. Matsuo
MoE
22
48
0
01 Jan 2021
EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets
EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets
Xiaohan Chen
Yu Cheng
Shuohang Wang
Zhe Gan
Zhangyang Wang
Jingjing Liu
44
99
0
31 Dec 2020
BinaryBERT: Pushing the Limit of BERT Quantization
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
145
221
0
31 Dec 2020
CascadeBERT: Accelerating Inference of Pre-trained Language Models via
  Calibrated Complete Models Cascade
CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade
Lei Li
Yankai Lin
Deli Chen
Shuhuai Ren
Peng Li
Jie Zhou
Xu Sun
29
51
0
29 Dec 2020
A Survey on Visual Transformer
A Survey on Visual Transformer
Kai Han
Yunhe Wang
Hanting Chen
Xinghao Chen
Jianyuan Guo
...
Chunjing Xu
Yixing Xu
Zhaohui Yang
Yiman Zhang
Dacheng Tao
ViT
25
2,135
0
23 Dec 2020
Improving Task-Agnostic BERT Distillation with Layer Mapping Search
Improving Task-Agnostic BERT Distillation with Layer Mapping Search
Xiaoqi Jiao
Huating Chang
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
Fang Wang
Qun Liu
29
12
0
11 Dec 2020
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
  Multi-Task NLP Inference
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference
Thierry Tambe
Coleman Hooper
Lillian Pentecost
Tianyu Jia
En-Yu Yang
...
Victor Sanh
P. Whatmough
Alexander M. Rush
David Brooks
Gu-Yeon Wei
20
117
0
28 Nov 2020
Empirical Evaluation of Deep Learning Model Compression Techniques on
  the WaveNet Vocoder
Empirical Evaluation of Deep Learning Model Compression Techniques on the WaveNet Vocoder
Sam Davis
Giuseppe Coccia
Sam Gooch
Julian Mack
14
0
0
20 Nov 2020
Don't Read Too Much into It: Adaptive Computation for Open-Domain
  Question Answering
Don't Read Too Much into It: Adaptive Computation for Open-Domain Question Answering
Yuxiang Wu
Sebastian Riedel
Pasquale Minervini
Pontus Stenetorp
30
8
0
10 Nov 2020
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads
Zhengyan Zhang
Fanchao Qi
Zhiyuan Liu
Qun Liu
Maosong Sun
VLM
46
30
0
07 Nov 2020
FastFormers: Highly Efficient Transformer Models for Natural Language
  Understanding
FastFormers: Highly Efficient Transformer Models for Natural Language Understanding
Young Jin Kim
Hany Awadalla
AI4CE
40
43
0
26 Oct 2020
An Investigation on Different Underlying Quantization Schemes for
  Pre-trained Language Models
An Investigation on Different Underlying Quantization Schemes for Pre-trained Language Models
Zihan Zhao
Yuncong Liu
Lu Chen
Qi Liu
Rao Ma
Kai Yu
MQ
24
12
0
14 Oct 2020
Weight Squeezing: Reparameterization for Knowledge Transfer and Model
  Compression
Weight Squeezing: Reparameterization for Knowledge Transfer and Model Compression
Artem Chumachenko
Daniil Gavrilov
Nikita Balagansky
Pavel Kalaidin
16
0
0
14 Oct 2020
Adversarial Self-Supervised Data-Free Distillation for Text
  Classification
Adversarial Self-Supervised Data-Free Distillation for Text Classification
Xinyin Ma
Yongliang Shen
Gongfan Fang
Chen Chen
Chenghao Jia
Weiming Lu
33
24
0
10 Oct 2020
Deep Learning Meets Projective Clustering
Deep Learning Meets Projective Clustering
Alaa Maalouf
Harry Lang
Daniela Rus
Dan Feldman
24
9
0
08 Oct 2020
AxFormer: Accuracy-driven Approximation of Transformers for Faster,
  Smaller and more Accurate NLP Models
AxFormer: Accuracy-driven Approximation of Transformers for Faster, Smaller and more Accurate NLP Models
Amrit Nagarajan
Sanchari Sen
Jacob R. Stevens
A. Raghunathan
16
3
0
07 Oct 2020
Pruning Redundant Mappings in Transformer Models via Spectral-Normalized
  Identity Prior
Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior
Zi Lin
Jeremiah Zhe Liu
Ziao Yang
Nan Hua
Dan Roth
33
46
0
05 Oct 2020
Which *BERT? A Survey Organizing Contextualized Encoders
Which *BERT? A Survey Organizing Contextualized Encoders
Patrick Xia
Shijie Wu
Benjamin Van Durme
26
50
0
02 Oct 2020
Contrastive Distillation on Intermediate Representations for Language
  Model Compression
Contrastive Distillation on Intermediate Representations for Language Model Compression
S. Sun
Zhe Gan
Yu Cheng
Yuwei Fang
Shuohang Wang
Jingjing Liu
VLM
28
69
0
29 Sep 2020
TernaryBERT: Distillation-aware Ultra-low Bit BERT
TernaryBERT: Distillation-aware Ultra-low Bit BERT
Wei Zhang
Lu Hou
Yichun Yin
Lifeng Shang
Xiao Chen
Xin Jiang
Qun Liu
MQ
35
209
0
27 Sep 2020
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot
  Learners
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
Timo Schick
Hinrich Schütze
53
956
0
15 Sep 2020
Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank
  Approximation
Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation
M. Tukan
Alaa Maalouf
Matan Weksler
Dan Feldman
25
9
0
11 Sep 2020
Finding Fast Transformers: One-Shot Neural Architecture Search by
  Component Composition
Finding Fast Transformers: One-Shot Neural Architecture Search by Component Composition
Henry Tsai
Jayden Ooi
Chun-Sung Ferng
Hyung Won Chung
Jason Riesa
ViT
25
21
0
15 Aug 2020
ConvBERT: Improving BERT with Span-based Dynamic Convolution
ConvBERT: Improving BERT with Span-based Dynamic Convolution
Zihang Jiang
Weihao Yu
Daquan Zhou
Yunpeng Chen
Jiashi Feng
Shuicheng Yan
48
157
0
06 Aug 2020
A Survey on Text Classification: From Shallow to Deep Learning
A Survey on Text Classification: From Shallow to Deep Learning
Qian Li
Hao Peng
Jianxin Li
Congyin Xia
Renyu Yang
Lichao Sun
Philip S. Yu
Lifang He
VLM
33
329
0
02 Aug 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear
  Attention
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
65
1,680
0
29 Jun 2020
SqueezeBERT: What can computer vision teach NLP about efficient neural
  networks?
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
F. Iandola
Albert Eaton Shaw
Ravi Krishna
Kurt Keutzer
VLM
28
127
0
19 Jun 2020
Accelerating Natural Language Understanding in Task-Oriented Dialog
Accelerating Natural Language Understanding in Task-Oriented Dialog
Ojas Ahuja
Shrey Desai
VLM
18
1
0
05 Jun 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
32
472
0
15 May 2020
Distilling Knowledge from Pre-trained Language Models via Text Smoothing
Distilling Knowledge from Pre-trained Language Models via Text Smoothing
Xing Wu
Yebin Liu
Xiangyang Zhou
Dianhai Yu
28
6
0
08 May 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy
  Efficient Inference
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
MQ
30
185
0
08 May 2020
General Purpose Text Embeddings from Pre-trained Language Models for
  Scalable Inference
General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference
Jingfei Du
Myle Ott
Haoran Li
Xing Zhou
Veselin Stoyanov
AI4CE
14
10
0
29 Apr 2020
ColBERT: Efficient and Effective Passage Search via Contextualized Late
  Interaction over BERT
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
Omar Khattab
Matei A. Zaharia
43
1,311
0
27 Apr 2020
Integer Quantization for Deep Learning Inference: Principles and
  Empirical Evaluation
Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation
Hao Wu
Patrick Judd
Xiaojie Zhang
Mikhail Isaev
Paulius Micikevicius
MQ
43
340
0
20 Apr 2020
The Right Tool for the Job: Matching Model and Instance Complexities
The Right Tool for the Job: Matching Model and Instance Complexities
Roy Schwartz
Gabriel Stanovsky
Swabha Swayamdipta
Jesse Dodge
Noah A. Smith
43
168
0
16 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
26
320
0
08 Apr 2020
On the Effect of Dropping Layers of Pre-trained Transformer Models
On the Effect of Dropping Layers of Pre-trained Transformer Models
Hassan Sajjad
Fahim Dalvi
Nadir Durrani
Preslav Nakov
33
132
0
08 Apr 2020
Pre-trained Models for Natural Language Processing: A Survey
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
248
1,454
0
18 Mar 2020
Efficient Intent Detection with Dual Sentence Encoders
Efficient Intent Detection with Dual Sentence Encoders
I. Casanueva
Tadas Temvcinas
D. Gerz
Matthew Henderson
Ivan Vulić
VLM
180
454
0
10 Mar 2020
DC-BERT: Decoupling Question and Document for Efficient Contextual
  Encoding
DC-BERT: Decoupling Question and Document for Efficient Contextual Encoding
Yuyu Zhang
Ping Nie
Xiubo Geng
Arun Ramamurthy
Le Song
Daxin Jiang
14
59
0
28 Feb 2020
A Primer in BERTology: What we know about how BERT works
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
35
1,464
0
27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yin Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
21
198
0
27 Feb 2020
PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector
  Elimination
PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination
Saurabh Goyal
Anamitra R. Choudhury
Saurabh ManishRaje
Venkatesan T. Chakaravarthy
Yogish Sabharwal
Ashish Verma
26
18
0
24 Jan 2020
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural
  Architecture Search
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search
Daoyuan Chen
Yaliang Li
Minghui Qiu
Zhen Wang
Bofang Li
Bolin Ding
Hongbo Deng
Jun Huang
Wei Lin
Jingren Zhou
MQ
24
104
0
13 Jan 2020
ConveRT: Efficient and Accurate Conversational Representations from
  Transformers
ConveRT: Efficient and Accurate Conversational Representations from Transformers
Matthew Henderson
I. Casanueva
Nikola Mrkvsić
Pei-hao Su
Tsung-Hsien
Ivan Vulić
26
196
0
09 Nov 2019
A Simplified Fully Quantized Transformer for End-to-end Speech
  Recognition
A Simplified Fully Quantized Transformer for End-to-end Speech Recognition
Alex Bie
Bharat Venkitesh
João Monteiro
Md. Akmal Haidar
Mehdi Rezagholizadeh
MQ
32
27
0
09 Nov 2019
Emergent Properties of Finetuned Language Representation Models
Emergent Properties of Finetuned Language Representation Models
Alexandre Matton
Luke de Oliveira
SSL
30
1
0
23 Oct 2019
Previous
1234567
Next