ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.06188
  4. Cited By
Q8BERT: Quantized 8Bit BERT

Q8BERT: Quantized 8Bit BERT

14 October 2019
Ofir Zafrir
Guy Boudoukh
Peter Izsak
Moshe Wasserblat
    MQ
ArXivPDFHTML

Papers citing "Q8BERT: Quantized 8Bit BERT"

50 / 304 papers shown
Title
Can depth-adaptive BERT perform better on binary classification tasks
Can depth-adaptive BERT perform better on binary classification tasks
Jing Fan
Xin Zhang
Sheng Zhang
Yan Pan
Lixiang Guo
MQ
20
0
0
22 Nov 2021
Efficient Softmax Approximation for Deep Neural Networks with Attention
  Mechanism
Efficient Softmax Approximation for Deep Neural Networks with Attention Mechanism
Ihor Vasyltsov
Wooseok Chang
33
12
0
21 Nov 2021
Prune Once for All: Sparse Pre-Trained Language Models
Prune Once for All: Sparse Pre-Trained Language Models
Ofir Zafrir
Ariel Larey
Guy Boudoukh
Haihao Shen
Moshe Wasserblat
VLM
34
82
0
10 Nov 2021
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient
  Framework
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework
Xingcheng Yao
Yanan Zheng
Xiaocong Yang
Zhilin Yang
37
44
0
07 Nov 2021
Magic Pyramid: Accelerating Inference with Early Exiting and Token
  Pruning
Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning
Xuanli He
I. Keivanloo
Yi Xu
Xiang He
Belinda Zeng
Santosh Rajagopalan
Trishul Chilimbi
21
18
0
30 Oct 2021
BERMo: What can BERT learn from ELMo?
BERMo: What can BERT learn from ELMo?
Sangamesh Kodge
Kaushik Roy
38
3
0
18 Oct 2021
Energon: Towards Efficient Acceleration of Transformers Using Dynamic
  Sparse Attention
Energon: Towards Efficient Acceleration of Transformers Using Dynamic Sparse Attention
Zhe Zhou
Junling Liu
Zhenyu Gu
Guangyu Sun
64
43
0
18 Oct 2021
SuperShaper: Task-Agnostic Super Pre-training of BERT Models with
  Variable Hidden Dimensions
SuperShaper: Task-Agnostic Super Pre-training of BERT Models with Variable Hidden Dimensions
Vinod Ganesan
Gowtham Ramesh
Pratyush Kumar
39
9
0
10 Oct 2021
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
MoE
29
118
0
05 Oct 2021
Towards Efficient Post-training Quantization of Pre-trained Language
  Models
Towards Efficient Post-training Quantization of Pre-trained Language Models
Haoli Bai
Lu Hou
Lifeng Shang
Xin Jiang
Irwin King
M. Lyu
MQ
82
47
0
30 Sep 2021
Understanding and Overcoming the Challenges of Efficient Transformer
  Quantization
Understanding and Overcoming the Challenges of Efficient Transformer Quantization
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
25
133
0
27 Sep 2021
Scale Efficiently: Insights from Pre-training and Fine-tuning
  Transformers
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Yi Tay
Mostafa Dehghani
J. Rao
W. Fedus
Samira Abnar
Hyung Won Chung
Sharan Narang
Dani Yogatama
Ashish Vaswani
Donald Metzler
206
111
0
22 Sep 2021
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
Md. Akmal Haidar
Nithin Anchuri
Mehdi Rezagholizadeh
Abbas Ghaddar
Philippe Langlais
Pascal Poupart
33
22
0
21 Sep 2021
Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT
  Compression
Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression
Canwen Xu
Wangchunshu Zhou
Tao Ge
Kelvin J. Xu
Julian McAuley
Furu Wei
21
41
0
07 Sep 2021
AMMUS : A Survey of Transformer-based Pretrained Models in Natural
  Language Processing
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
VLM
LM&MA
31
261
0
12 Aug 2021
Decoupled Transformer for Scalable Inference in Open-domain Question
  Answering
Decoupled Transformer for Scalable Inference in Open-domain Question Answering
Haytham ElFadeel
Stanislav Peshterliev
40
1
0
05 Aug 2021
Self-supervised Answer Retrieval on Clinical Notes
Self-supervised Answer Retrieval on Clinical Notes
Paul Grundmann
Sebastian Arnold
Alexander Loser
RALM
MedIm
19
2
0
02 Aug 2021
Multi-stage Pre-training over Simplified Multimodal Pre-training Models
Multi-stage Pre-training over Simplified Multimodal Pre-training Models
Tongtong Liu
Fangxiang Feng
Xiaojie Wang
21
14
0
22 Jul 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
51
58
0
13 Jul 2021
Image Complexity Guided Network Compression for Biomedical Image
  Segmentation
Image Complexity Guided Network Compression for Biomedical Image Segmentation
Suraj Mishra
Danny Chen
X. Sharon
37
6
0
06 Jul 2021
Learned Token Pruning for Transformers
Learned Token Pruning for Transformers
Sehoon Kim
Sheng Shen
D. Thorsley
A. Gholami
Woosuk Kwon
Joseph Hassoun
Kurt Keutzer
17
146
0
02 Jul 2021
Elbert: Fast Albert with Confidence-Window Based Early Exit
Elbert: Fast Albert with Confidence-Window Based Early Exit
Keli Xie
Siyuan Lu
Meiqi Wang
Zhongfeng Wang
22
20
0
01 Jul 2021
Improving the Efficiency of Transformers for Resource-Constrained
  Devices
Improving the Efficiency of Transformers for Resource-Constrained Devices
Hamid Tabani
Ajay Balasubramaniam
Shabbir Marzban
Elahe Arani
Bahram Zonooz
46
20
0
30 Jun 2021
Post-Training Quantization for Vision Transformer
Post-Training Quantization for Vision Transformer
Zhenhua Liu
Yunhe Wang
Kai Han
Siwei Ma
Wen Gao
ViT
MQ
61
327
0
27 Jun 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
58
818
0
14 Jun 2021
FastSeq: Make Sequence Generation Faster
FastSeq: Make Sequence Generation Faster
Yu Yan
Fei Hu
Jiusheng Chen
Nikhil Bhendawade
Ting Ye
Yeyun Gong
Nan Duan
Desheng Cui
Bingyu Chi
Ruifei Zhang
VLM
24
15
0
08 Jun 2021
Enabling Lightweight Fine-tuning for Pre-trained Language Model
  Compression based on Matrix Product Operators
Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators
Peiyu Liu
Ze-Feng Gao
Wayne Xin Zhao
Z. Xie
Zhong-Yi Lu
Ji-Rong Wen
23
29
0
04 Jun 2021
On the Distribution, Sparsity, and Inference-time Quantization of
  Attention Values in Transformers
On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers
Tianchu Ji
Shraddhan Jain
M. Ferdman
Peter Milder
H. Andrew Schwartz
Niranjan Balasubramanian
MQ
58
15
0
02 Jun 2021
DoT: An efficient Double Transformer for NLP tasks with tables
DoT: An efficient Double Transformer for NLP tasks with tables
Syrine Krichene
Thomas Müller
Julian Martin Eisenschlos
20
14
0
01 Jun 2021
Gender Bias Amplification During Speed-Quality Optimization in Neural
  Machine Translation
Gender Bias Amplification During Speed-Quality Optimization in Neural Machine Translation
Adithya Renduchintala
Denise Díaz
Kenneth Heafield
Xian Li
Mona T. Diab
23
41
0
01 Jun 2021
LEAP: Learnable Pruning for Transformer-based Models
LEAP: Learnable Pruning for Transformer-based Models
Z. Yao
Xiaoxia Wu
Linjian Ma
Sheng Shen
Kurt Keutzer
Michael W. Mahoney
Yuxiong He
30
7
0
30 May 2021
NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural
  Architecture Search
NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search
Jin Xu
Xu Tan
Renqian Luo
Kaitao Song
Jian Li
Tao Qin
Tie-Yan Liu
MQ
23
78
0
30 May 2021
Low-Precision Hardware Architectures Meet Recommendation Model Inference
  at Scale
Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale
Zhaoxia Deng
Deng
Jongsoo Park
P. T. P. Tang
Haixin Liu
...
S. Nadathur
Changkyu Kim
Maxim Naumov
S. Naghshineh
M. Smelyanskiy
29
11
0
26 May 2021
BERT Busters: Outlier Dimensions that Disrupt Transformers
BERT Busters: Outlier Dimensions that Disrupt Transformers
Olga Kovaleva
Saurabh Kulshreshtha
Anna Rogers
Anna Rumshisky
27
85
0
14 May 2021
Teaching a Massive Open Online Course on Natural Language Processing
Teaching a Massive Open Online Course on Natural Language Processing
Ekaterina Artemova
M. Apishev
V. Sarkisyan
Sergey Aksenov
D. Kirjanov
O. Serikov
VLM
19
4
0
26 Apr 2021
Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm
Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm
Dongkuan Xu
Ian En-Hsu Yen
Jinxi Zhao
Zhibin Xiao
VLM
AAML
31
56
0
18 Apr 2021
Vision Transformer Pruning
Vision Transformer Pruning
Mingjian Zhu
Yehui Tang
Kai Han
ViT
19
90
0
17 Apr 2021
Designing a Minimal Retrieve-and-Read System for Open-Domain Question
  Answering
Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering
Sohee Yang
Minjoon Seo
RALM
22
8
0
15 Apr 2021
HBert + BiasCorp -- Fighting Racism on the Web
HBert + BiasCorp -- Fighting Racism on the Web
Olawale Onabola
Zhuang Ma
Yang Xie
Benjamin Akera
A. Ibraheem
Jia Xue
Dianbo Liu
Yoshua Bengio
34
6
0
06 Apr 2021
Integer-only Zero-shot Quantization for Efficient Speech Recognition
Integer-only Zero-shot Quantization for Efficient Speech Recognition
Sehoon Kim
A. Gholami
Z. Yao
Nicholas Lee
Patrick Wang
Aniruddha Nrusimha
Bohan Zhai
Tianren Gao
Michael W. Mahoney
Kurt Keutzer
MQ
25
23
0
31 Mar 2021
A Practical Survey on Faster and Lighter Transformers
A Practical Survey on Faster and Lighter Transformers
Quentin Fournier
G. Caron
Daniel Aloise
19
93
0
26 Mar 2021
RCT: Resource Constrained Training for Edge AI
RCT: Resource Constrained Training for Edge AI
Tian Huang
Yaoyu Zhang
Ming Yan
Qiufeng Wang
Rick Siow Mong Goh
38
8
0
26 Mar 2021
Finetuning Pretrained Transformers into RNNs
Finetuning Pretrained Transformers into RNNs
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
46
63
0
24 Mar 2021
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning
  Architectures
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
Sushant Singh
A. Mahmood
AI4TS
60
94
0
23 Mar 2021
Hardware Acceleration of Fully Quantized BERT for Efficient Natural
  Language Processing
Hardware Acceleration of Fully Quantized BERT for Efficient Natural Language Processing
Zejian Liu
Gang Li
Jian Cheng
MQ
10
60
0
04 Mar 2021
Improved Customer Transaction Classification using Semi-Supervised
  Knowledge Distillation
Improved Customer Transaction Classification using Semi-Supervised Knowledge Distillation
Rohan Sukumaran
22
2
0
15 Feb 2021
Confounding Tradeoffs for Neural Network Quantization
Confounding Tradeoffs for Neural Network Quantization
Sahaj Garg
Anirudh Jain
Joe Lou
Mitchell Nahmias
MQ
29
17
0
12 Feb 2021
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision
  Neural Network Inference
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference
Steve Dai
Rangharajan Venkatesan
Haoxing Ren
B. Zimmer
W. Dally
Brucek Khailany
MQ
35
68
0
08 Feb 2021
Nyströmformer: A Nyström-Based Algorithm for Approximating
  Self-Attention
Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
Yunyang Xiong
Zhanpeng Zeng
Rudrasis Chakraborty
Mingxing Tan
G. Fung
Yin Li
Vikas Singh
47
508
0
07 Feb 2021
AutoFreeze: Automatically Freezing Model Blocks to Accelerate
  Fine-tuning
AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning
Yuhan Liu
Saurabh Agarwal
Shivaram Venkataraman
OffRL
22
54
0
02 Feb 2021
Previous
1234567
Next