ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2008.00638
  4. Cited By
High Throughput Matrix-Matrix Multiplication between Asymmetric
  Bit-Width Operands

High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands

3 August 2020
Dibakar Gope
Jesse G. Beu
Matthew Mattina
ArXivPDFHTML

Papers citing "High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands"

18 / 18 papers shown
Title
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
Jianyu Wei
Shijie Cao
Ting Cao
Lingxiao Ma
Lei Wang
Yanyong Zhang
Mao Yang
MQ
68
12
0
25 Jun 2024
Quantization Networks
Quantization Networks
Jiwei Yang
Xu Shen
Jun Xing
Xinmei Tian
Houqiang Li
Bing Deng
Jianqiang Huang
Xiansheng Hua
MQ
68
345
0
21 Nov 2019
Ternary MobileNets via Per-Layer Hybrid Filter Banks
Ternary MobileNets via Per-Layer Hybrid Filter Banks
Dibakar Gope
Jesse G. Beu
Urmish Thakker
Matthew Mattina
MQ
49
15
0
04 Nov 2019
Pushing the limits of RNN Compression
Pushing the limits of RNN Compression
Urmish Thakker
Igor Fedorov
Jesse G. Beu
Dibakar Gope
Chu Zhou
Ganesh S. Dasika
Matthew Mattina
30
13
0
04 Oct 2019
Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit
  Neural Networks
Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks
Ruihao Gong
Xianglong Liu
Shenghu Jiang
Tian-Hao Li
Peng Hu
Jiazhen Lin
F. Yu
Junjie Yan
MQ
58
457
0
14 Aug 2019
Run-Time Efficient RNN Compression for Inference on Edge Devices
Run-Time Efficient RNN Compression for Inference on Edge Devices
Urmish Thakker
Jesse G. Beu
Dibakar Gope
Ganesh S. Dasika
Matthew Mattina
41
19
0
12 Jun 2019
Compressing RNNs for IoT devices by 15-38x using Kronecker Products
Compressing RNNs for IoT devices by 15-38x using Kronecker Products
Urmish Thakker
Jesse G. Beu
Dibakar Gope
Chu Zhou
Igor Fedorov
Ganesh S. Dasika
Matthew Mattina
46
36
0
07 Jun 2019
Multi-Precision Quantized Neural Networks via Encoding Decomposition of
  -1 and +1
Multi-Precision Quantized Neural Networks via Encoding Decomposition of -1 and +1
Qigong Sun
Fanhua Shang
Kan Yang
Xiufang Li
Yan Ren
L. Jiao
MQ
58
12
0
31 May 2019
Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT
  Applications
Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications
Dibakar Gope
Ganesh S. Dasika
Matthew Mattina
47
23
0
04 Mar 2019
Structured Binary Neural Networks for Accurate Image Classification and
  Semantic Segmentation
Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation
Bohan Zhuang
Chunhua Shen
Mingkui Tan
Lingqiao Liu
Ian Reid
MQ
79
154
0
22 Nov 2018
Learning to Quantize Deep Networks by Optimizing Quantization Intervals
  with Task Loss
Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss
S. Jung
Changyong Son
Seohyung Lee
JinWoo Son
Youngjun Kwak
Jae-Joon Han
Sung Ju Hwang
Changkyu Choi
MQ
48
374
0
17 Aug 2018
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep
  Neural Networks
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
Dongqing Zhang
Jiaolong Yang
Dongqiangzi Ye
G. Hua
MQ
59
703
0
26 Jul 2018
Binary Ensemble Neural Network: More Bits per Network or More Networks
  per Bit?
Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?
Shilin Zhu
Xin Dong
Hao Su
MQ
66
137
0
20 Jun 2018
StrassenNets: Deep Learning with a Multiplication Budget
StrassenNets: Deep Learning with a Multiplication Budget
Michael Tschannen
Aran Khanna
Anima Anandkumar
44
30
0
11 Dec 2017
Network Sketching: Exploiting Binary Structure in Deep CNNs
Network Sketching: Exploiting Binary Structure in Deep CNNs
Yiwen Guo
Anbang Yao
Hao Zhao
Yurong Chen
MQ
66
95
0
07 Jun 2017
In-Datacenter Performance Analysis of a Tensor Processing Unit
In-Datacenter Performance Analysis of a Tensor Processing Unit
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
...
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
227
4,626
0
16 Apr 2017
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained
  Quantization and Huffman Coding
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han
Huizi Mao
W. Dally
3DGS
247
8,832
0
01 Oct 2015
Speeding up Convolutional Neural Networks with Low Rank Expansions
Speeding up Convolutional Neural Networks with Low Rank Expansions
Max Jaderberg
Andrea Vedaldi
Andrew Zisserman
128
1,462
0
15 May 2014
1