ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.00532
  4. Cited By
Efficient 8-Bit Quantization of Transformer Neural Machine Language
  Translation Model

Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model

3 June 2019
Aishwarya Bhandare
Vamsi Sripathi
Deepthi Karkada
Vivek V. Menon
Sun Choi
Kushal Datta
V. Saletore
    MQ
ArXivPDFHTML

Papers citing "Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model"

21 / 71 papers shown
Title
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision
  Neural Network Inference
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference
Steve Dai
Rangharajan Venkatesan
Haoxing Ren
B. Zimmer
W. Dally
Brucek Khailany
MQ
33
68
0
08 Feb 2021
I-BERT: Integer-only BERT Quantization
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
107
345
0
05 Jan 2021
A Survey on Visual Transformer
A Survey on Visual Transformer
Kai Han
Yunhe Wang
Hanting Chen
Xinghao Chen
Jianyuan Guo
...
Chunjing Xu
Yixing Xu
Zhaohui Yang
Yiman Zhang
Dacheng Tao
ViT
23
2,135
0
23 Dec 2020
Layer-Wise Data-Free CNN Compression
Layer-Wise Data-Free CNN Compression
Maxwell Horton
Yanzi Jin
Ali Farhadi
Mohammad Rastegari
MQ
24
17
0
18 Nov 2020
Fast Interleaved Bidirectional Sequence Generation
Fast Interleaved Bidirectional Sequence Generation
Biao Zhang
Ivan Titov
Rico Sennrich
16
12
0
27 Oct 2020
FastFormers: Highly Efficient Transformer Models for Natural Language
  Understanding
FastFormers: Highly Efficient Transformer Models for Natural Language Understanding
Young Jin Kim
Hany Awadalla
AI4CE
32
42
0
26 Oct 2020
An Investigation on Different Underlying Quantization Schemes for
  Pre-trained Language Models
An Investigation on Different Underlying Quantization Schemes for Pre-trained Language Models
Zihan Zhao
Yuncong Liu
Lu Chen
Qi Liu
Rao Ma
Kai Yu
MQ
24
12
0
14 Oct 2020
Towards Fully 8-bit Integer Inference for the Transformer Model
Towards Fully 8-bit Integer Inference for the Transformer Model
Ye Lin
Yanyang Li
Tengbo Liu
Tong Xiao
Tongran Liu
Jingbo Zhu
MQ
11
62
0
17 Sep 2020
Extremely Low Bit Transformer Quantization for On-Device Neural Machine
  Translation
Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation
Insoo Chung
Byeongwook Kim
Yoonjung Choi
S. Kwon
Yongkweon Jeon
Baeseong Park
Sangha Kim
Dongsoo Lee
MQ
29
27
0
16 Sep 2020
Degree-Quant: Quantization-Aware Training for Graph Neural Networks
Degree-Quant: Quantization-Aware Training for Graph Neural Networks
Shyam A. Tailor
Javier Fernandez-Marques
Nicholas D. Lane
GNN
MQ
29
140
0
11 Aug 2020
Deep Partial Updating: Towards Communication Efficient Updating for
  On-device Inference
Deep Partial Updating: Towards Communication Efficient Updating for On-device Inference
Zhongnan Qu
Cong Liu
Lothar Thiele
3DH
29
3
0
06 Jul 2020
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based
  Quantized DNNs
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs
Yongkweon Jeon
Baeseong Park
S. Kwon
Byeongwook Kim
Jeongin Yun
Dongsoo Lee
MQ
33
30
0
20 May 2020
Integer Quantization for Deep Learning Inference: Principles and
  Empirical Evaluation
Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation
Hao Wu
Patrick Judd
Xiaojie Zhang
Mikhail Isaev
Paulius Micikevicius
MQ
32
340
0
20 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
26
320
0
08 Apr 2020
Learning Accurate Integer Transformer Machine-Translation Models
Learning Accurate Integer Transformer Machine-Translation Models
Ephrem Wu
19
4
0
03 Jan 2020
QKD: Quantization-aware Knowledge Distillation
QKD: Quantization-aware Knowledge Distillation
Jangho Kim
Yash Bhalgat
Jinwon Lee
Chirag I. Patel
Nojun Kwak
MQ
24
63
0
28 Nov 2019
ConveRT: Efficient and Accurate Conversational Representations from
  Transformers
ConveRT: Efficient and Accurate Conversational Representations from Transformers
Matthew Henderson
I. Casanueva
Nikola Mrkvsić
Pei-hao Su
Tsung-Hsien
Ivan Vulić
21
196
0
09 Nov 2019
A Simplified Fully Quantized Transformer for End-to-end Speech
  Recognition
A Simplified Fully Quantized Transformer for End-to-end Speech Recognition
Alex Bie
Bharat Venkitesh
João Monteiro
Md. Akmal Haidar
Mehdi Rezagholizadeh
MQ
32
27
0
09 Nov 2019
Fully Quantized Transformer for Machine Translation
Fully Quantized Transformer for Machine Translation
Gabriele Prato
Ella Charlaix
Mehdi Rezagholizadeh
MQ
13
68
0
17 Oct 2019
Q8BERT: Quantized 8Bit BERT
Q8BERT: Quantized 8Bit BERT
Ofir Zafrir
Guy Boudoukh
Peter Izsak
Moshe Wasserblat
MQ
9
500
0
14 Oct 2019
AdaptivFloat: A Floating-point based Data Type for Resilient Deep
  Learning Inference
AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference
Thierry Tambe
En-Yu Yang
Zishen Wan
Yuntian Deng
Vijay Janapa Reddi
Alexander M. Rush
David Brooks
Gu-Yeon Wei
MQ
19
21
0
29 Sep 2019
Previous
12