Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2008.00638
Cited By
High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands
3 August 2020
Dibakar Gope
Jesse G. Beu
Matthew Mattina
Re-assign community
ArXiv
PDF
HTML
Papers citing
"High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands"
18 / 18 papers shown
Title
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
Jianyu Wei
Shijie Cao
Ting Cao
Lingxiao Ma
Lei Wang
Yanyong Zhang
Mao Yang
MQ
68
12
0
25 Jun 2024
Quantization Networks
Jiwei Yang
Xu Shen
Jun Xing
Xinmei Tian
Houqiang Li
Bing Deng
Jianqiang Huang
Xiansheng Hua
MQ
68
345
0
21 Nov 2019
Ternary MobileNets via Per-Layer Hybrid Filter Banks
Dibakar Gope
Jesse G. Beu
Urmish Thakker
Matthew Mattina
MQ
49
15
0
04 Nov 2019
Pushing the limits of RNN Compression
Urmish Thakker
Igor Fedorov
Jesse G. Beu
Dibakar Gope
Chu Zhou
Ganesh S. Dasika
Matthew Mattina
30
13
0
04 Oct 2019
Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks
Ruihao Gong
Xianglong Liu
Shenghu Jiang
Tian-Hao Li
Peng Hu
Jiazhen Lin
F. Yu
Junjie Yan
MQ
58
457
0
14 Aug 2019
Run-Time Efficient RNN Compression for Inference on Edge Devices
Urmish Thakker
Jesse G. Beu
Dibakar Gope
Ganesh S. Dasika
Matthew Mattina
41
19
0
12 Jun 2019
Compressing RNNs for IoT devices by 15-38x using Kronecker Products
Urmish Thakker
Jesse G. Beu
Dibakar Gope
Chu Zhou
Igor Fedorov
Ganesh S. Dasika
Matthew Mattina
46
36
0
07 Jun 2019
Multi-Precision Quantized Neural Networks via Encoding Decomposition of -1 and +1
Qigong Sun
Fanhua Shang
Kan Yang
Xiufang Li
Yan Ren
L. Jiao
MQ
58
12
0
31 May 2019
Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications
Dibakar Gope
Ganesh S. Dasika
Matthew Mattina
47
23
0
04 Mar 2019
Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation
Bohan Zhuang
Chunhua Shen
Mingkui Tan
Lingqiao Liu
Ian Reid
MQ
79
154
0
22 Nov 2018
Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss
S. Jung
Changyong Son
Seohyung Lee
JinWoo Son
Youngjun Kwak
Jae-Joon Han
Sung Ju Hwang
Changkyu Choi
MQ
48
374
0
17 Aug 2018
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
Dongqing Zhang
Jiaolong Yang
Dongqiangzi Ye
G. Hua
MQ
59
703
0
26 Jul 2018
Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?
Shilin Zhu
Xin Dong
Hao Su
MQ
66
137
0
20 Jun 2018
StrassenNets: Deep Learning with a Multiplication Budget
Michael Tschannen
Aran Khanna
Anima Anandkumar
44
30
0
11 Dec 2017
Network Sketching: Exploiting Binary Structure in Deep CNNs
Yiwen Guo
Anbang Yao
Hao Zhao
Yurong Chen
MQ
66
95
0
07 Jun 2017
In-Datacenter Performance Analysis of a Tensor Processing Unit
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
...
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
227
4,626
0
16 Apr 2017
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han
Huizi Mao
W. Dally
3DGS
247
8,832
0
01 Oct 2015
Speeding up Convolutional Neural Networks with Low Rank Expansions
Max Jaderberg
Andrea Vedaldi
Andrew Zisserman
128
1,462
0
15 May 2014
1