ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.06188
  4. Cited By
Q8BERT: Quantized 8Bit BERT

Q8BERT: Quantized 8Bit BERT

14 October 2019
Ofir Zafrir
Guy Boudoukh
Peter Izsak
Moshe Wasserblat
    MQ
ArXivPDFHTML

Papers citing "Q8BERT: Quantized 8Bit BERT"

50 / 304 papers shown
Title
Processing Natural Language on Embedded Devices: How Well Do Transformer
  Models Perform?
Processing Natural Language on Embedded Devices: How Well Do Transformer Models Perform?
Souvik Sarkar
Mohammad Fakhruddin Babar
Md. Mahadi Hassan
M. Hasan
Shubhra (Santu) Karmaker
22
1
0
23 Apr 2023
Transformer-based models and hardware acceleration analysis in
  autonomous driving: A survey
Transformer-based models and hardware acceleration analysis in autonomous driving: A survey
J. Zhong
Zheng Liu
Xiangshan Chen
ViT
51
17
0
21 Apr 2023
Outlier Suppression+: Accurate quantization of large language models by
  equivalent and optimal shifting and scaling
Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
Xiuying Wei
Yunchen Zhang
Yuhang Li
Xiangguo Zhang
Ruihao Gong
Jian Ren
Zhengang Li
MQ
27
31
0
18 Apr 2023
Sim-T: Simplify the Transformer Network by Multiplexing Technique for
  Speech Recognition
Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition
Guangyong Wei
Zhikui Duan
Shiren Li
Guangguang Yang
Xinmei Yu
Junhua Li
30
4
0
11 Apr 2023
To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence
  Models for Improved Inference Efficiency
To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency
Daniel Fernando Campos
Chengxiang Zhai
27
2
0
05 Apr 2023
Blockwise Compression of Transformer-based Models without Retraining
Blockwise Compression of Transformer-based Models without Retraining
Gaochen Dong
W. Chen
26
3
0
04 Apr 2023
Quick Dense Retrievers Consume KALE: Post Training Kullback Leibler
  Alignment of Embeddings for Asymmetrical dual encoders
Quick Dense Retrievers Consume KALE: Post Training Kullback Leibler Alignment of Embeddings for Asymmetrical dual encoders
Daniel Fernando Campos
Alessandro Magnani
Chengxiang Zhai
25
3
0
31 Mar 2023
FP8 versus INT8 for efficient deep learning inference
FP8 versus INT8 for efficient deep learning inference
M. V. Baalen
Andrey Kuzmin
Suparna S. Nair
Yuwei Ren
E. Mahurin
...
Sundar Subramanian
Sanghyuk Lee
Markus Nagel
Joseph B. Soriaga
Tijmen Blankevoort
MQ
31
45
0
31 Mar 2023
oBERTa: Improving Sparse Transfer Learning via improved initialization,
  distillation, and pruning regimes
oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes
Daniel Fernando Campos
Alexandre Marques
Mark Kurtz
Chengxiang Zhai
VLM
AAML
18
2
0
30 Mar 2023
Scaled Quantization for the Vision Transformer
Scaled Quantization for the Vision Transformer
Yangyang Chang
G. E. Sobelman
MQ
21
1
0
23 Mar 2023
Language Model Behavior: A Comprehensive Survey
Language Model Behavior: A Comprehensive Survey
Tyler A. Chang
Benjamin Bergen
VLM
LRM
LM&MA
32
103
0
20 Mar 2023
SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for
  Accelerating BERT Inference
SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for Accelerating BERT Inference
Boren Hu
Yun Zhu
Jiacheng Li
Siliang Tang
19
8
0
16 Mar 2023
Block-wise Bit-Compression of Transformer-based Models
Gaochen Dong
W. Chen
24
0
0
16 Mar 2023
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from
  Comprehensive Study to Low Rank Compensation
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Z. Yao
Xiaoxia Wu
Cheng-rong Li
Stephen Youn
Yuxiong He
MQ
63
57
0
15 Mar 2023
Rediscovering Hashed Random Projections for Efficient Quantization of
  Contextualized Sentence Embeddings
Rediscovering Hashed Random Projections for Efficient Quantization of Contextualized Sentence Embeddings
Ulf A. Hamster
Ji-Ung Lee
Alexander Geyken
Iryna Gurevych
41
0
0
13 Mar 2023
Dynamic Stashing Quantization for Efficient Transformer Training
Dynamic Stashing Quantization for Efficient Transformer Training
Guofu Yang
Daniel Lo
Robert D. Mullins
Yiren Zhao
MQ
47
8
0
09 Mar 2023
Gradient-Free Structured Pruning with Unlabeled Data
Gradient-Free Structured Pruning with Unlabeled Data
Azade Nova
H. Dai
Dale Schuurmans
SyDa
40
20
0
07 Mar 2023
Knowledge-Enhanced Semi-Supervised Federated Learning for Aggregating
  Heterogeneous Lightweight Clients in IoT
Knowledge-Enhanced Semi-Supervised Federated Learning for Aggregating Heterogeneous Lightweight Clients in IoT
Jiaqi Wang
Shenglai Zeng
Zewei Long
Yaqing Wang
Houping Xiao
Fenglong Ma
27
16
0
05 Mar 2023
BPT: Binary Point Cloud Transformer for Place Recognition
BPT: Binary Point Cloud Transformer for Place Recognition
Zhixing Hou
Yuzhang Shang
Tian Gao
Yan Yan
MQ
ViT
42
3
0
02 Mar 2023
Fast Attention Requires Bounded Entries
Fast Attention Requires Bounded Entries
Josh Alman
Zhao Song
30
81
0
26 Feb 2023
MUX-PLMs: Data Multiplexing for High-throughput Language Models
MUX-PLMs: Data Multiplexing for High-throughput Language Models
Vishvak Murahari
Ameet Deshpande
Carlos E. Jimenez
Izhak Shafran
Mingqiu Wang
Yuan Cao
Karthik Narasimhan
MoE
31
5
0
24 Feb 2023
Teacher Intervention: Improving Convergence of Quantization Aware
  Training for Ultra-Low Precision Transformers
Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers
Minsoo Kim
Kyuhong Shim
Seongmin Park
Wonyong Sung
Jungwook Choi
MQ
19
1
0
23 Feb 2023
Optical Transformers
Optical Transformers
Maxwell G. Anderson
Shifan Ma
Tianyu Wang
Logan G. Wright
Peter L. McMahon
25
20
0
20 Feb 2023
Speculative Decoding with Big Little Decoder
Speculative Decoding with Big Little Decoder
Sehoon Kim
K. Mangalam
Suhong Moon
Jitendra Malik
Michael W. Mahoney
A. Gholami
Kurt Keutzer
MoE
38
100
0
15 Feb 2023
Towards Optimal Compression: Joint Pruning and Quantization
Towards Optimal Compression: Joint Pruning and Quantization
Ben Zandonati
Glenn Bucagu
Adrian Alan Pol
M. Pierini
Olya Sirkin
Tal Kopetz
MQ
30
2
0
15 Feb 2023
Binarized Neural Machine Translation
Binarized Neural Machine Translation
Yichi Zhang
Ankush Garg
Yuan Cao
Lukasz Lew
Behrooz Ghorbani
Zhiru Zhang
Orhan Firat
MQ
36
14
0
09 Feb 2023
Efficient and Effective Methods for Mixed Precision Neural Network
  Quantization for Faster, Energy-efficient Inference
Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energy-efficient Inference
Deepika Bablani
J. McKinstry
S. K. Esser
R. Appuswamy
D. Modha
MQ
28
4
0
30 Jan 2023
Vision Transformer Computation and Resilience for Dynamic Inference
Vision Transformer Computation and Resilience for Dynamic Inference
Kavya Sreedhar
Jason Clemons
Rangharajan Venkatesan
S. Keckler
M. Horowitz
32
2
0
06 Dec 2022
Quadapter: Adapter for GPT-2 Quantization
Quadapter: Adapter for GPT-2 Quantization
Minseop Park
J. You
Markus Nagel
Simyung Chang
MQ
34
9
0
30 Nov 2022
Compressing Cross-Lingual Multi-Task Models at Qualtrics
Compressing Cross-Lingual Multi-Task Models at Qualtrics
Daniel Fernando Campos
Daniel J. Perry
S. Joshi
Yashmeet Gambhir
Wei Du
Zhengzheng Xing
Aaron Colak
29
1
0
29 Nov 2022
Understanding and Improving Knowledge Distillation for
  Quantization-Aware Training of Large Transformer Encoders
Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders
Minsoo Kim
Sihwa Lee
S. Hong
Duhyeuk Chang
Jungwook Choi
MQ
24
12
0
20 Nov 2022
Compressing Transformer-based self-supervised models for speech
  processing
Compressing Transformer-based self-supervised models for speech processing
Tzu-Quan Lin
Tsung-Huan Yang
Chun-Yao Chang
Kuang-Ming Chen
Tzu-hsun Feng
Hung-yi Lee
Hao Tang
45
6
0
17 Nov 2022
Zero-Shot Dynamic Quantization for Transformer Inference
Zero-Shot Dynamic Quantization for Transformer Inference
Yousef El-Kurdi
Jerry Quinn
Avirup Sil
MQ
22
1
0
17 Nov 2022
Fast and Accurate FSA System Using ELBERT: An Efficient and Lightweight
  BERT
Fast and Accurate FSA System Using ELBERT: An Efficient and Lightweight BERT
Siyuan Lu
Chenchen Zhou
Keli Xie
Jun Lin
Zhongfeng Wang
32
1
0
16 Nov 2022
Efficiently Scaling Transformer Inference
Efficiently Scaling Transformer Inference
Reiner Pope
Sholto Douglas
Aakanksha Chowdhery
Jacob Devlin
James Bradbury
Anselm Levskaya
Jonathan Heek
Kefan Xiao
Shivani Agrawal
J. Dean
48
300
0
09 Nov 2022
QuaLA-MiniLM: a Quantized Length Adaptive MiniLM
QuaLA-MiniLM: a Quantized Length Adaptive MiniLM
Shira Guskin
Moshe Wasserblat
Chang Wang
Haihao Shen
MQ
19
2
0
31 Oct 2022
Empirical Evaluation of Post-Training Quantization Methods for Language
  Tasks
Empirical Evaluation of Post-Training Quantization Methods for Language Tasks
Ting Hu
Christoph Meinel
Haojin Yang
MQ
28
3
0
29 Oct 2022
BEBERT: Efficient and Robust Binary Ensemble BERT
BEBERT: Efficient and Robust Binary Ensemble BERT
Jiayi Tian
Chao Fang
Hong Wang
Zhongfeng Wang
MQ
32
16
0
28 Oct 2022
Fast DistilBERT on CPUs
Fast DistilBERT on CPUs
Haihao Shen
Ofir Zafrir
Bo Dong
Hengyu Meng
Xinyu. Ye
Zhe Wang
Yi Ding
Hanwen Chang
Guy Boudoukh
Moshe Wasserblat
VLM
29
2
0
27 Oct 2022
Too Brittle To Touch: Comparing the Stability of Quantization and
  Distillation Towards Developing Lightweight Low-Resource MT Models
Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models
Harshita Diddee
Sandipan Dandapat
Monojit Choudhury
T. Ganu
Kalika Bali
38
5
0
27 Oct 2022
Compressing And Debiasing Vision-Language Pre-Trained Models for Visual
  Question Answering
Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering
Q. Si
Yuanxin Liu
Zheng Lin
Peng Fu
Weiping Wang
VLM
47
1
0
26 Oct 2022
Legal-Tech Open Diaries: Lesson learned on how to develop and deploy
  light-weight models in the era of humongous Language Models
Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models
Stelios Maroudas
Sotiris Legkas
Prodromos Malakasiotis
Ilias Chalkidis
VLM
AILaw
ALM
ELM
29
4
0
24 Oct 2022
Sub-8-bit quantization for on-device speech recognition: a
  regularization-free approach
Sub-8-bit quantization for on-device speech recognition: a regularization-free approach
Kai Zhen
Martin H. Radfar
Hieu Duy Nguyen
Grant P. Strimel
Nathan Susanj
Athanasios Mouchtaris
MQ
33
8
0
17 Oct 2022
EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge
  Distillation and Modal-adaptive Pruning
EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning
Tiannan Wang
Wangchunshu Zhou
Yan Zeng
Xinsong Zhang
VLM
36
37
0
14 Oct 2022
SQuAT: Sharpness- and Quantization-Aware Training for BERT
SQuAT: Sharpness- and Quantization-Aware Training for BERT
Zheng Wang
Juncheng Billy Li
Shuhui Qu
Florian Metze
Emma Strubell
MQ
29
7
0
13 Oct 2022
Spontaneous Emerging Preference in Two-tower Language Model
Spontaneous Emerging Preference in Two-tower Language Model
Zhengqi He
Taro Toyoizumi
LRM
23
1
0
13 Oct 2022
Block Format Error Bounds and Optimal Block Size Selection
Block Format Error Bounds and Optimal Block Size Selection
I. Soloveychik
I. Lyubomirsky
Xin Eric Wang
S. Bhoja
MQ
29
4
0
11 Oct 2022
A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
Yuanxin Liu
Fandong Meng
Zheng Lin
JiangNan Li
Peng Fu
Yanan Cao
Weiping Wang
Jie Zhou
44
5
0
11 Oct 2022
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of
  Large-Scale Pre-Trained Language Models
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
S. Kwon
Jeonghoon Kim
Jeongin Bae
Kang Min Yoo
Jin-Hwa Kim
Baeseong Park
Byeongwook Kim
Jung-Woo Ha
Nako Sung
Dongsoo Lee
MQ
41
30
0
08 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
275
1,077
0
05 Oct 2022
Previous
1234567
Next