Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.04240
Cited By
Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework
8 December 2020
Sung-En Chang
Yanyu Li
Mengshu Sun
Runbin Shi
Hayden Kwok-Hay So
Xuehai Qian
Yanzhi Wang
Xue Lin
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework"
27 / 27 papers shown
Title
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
Minsu Kim
Seongmin Hong
RyeoWook Ko
S. Choi
Hunjong Lee
Junsoo Kim
Joo-Young Kim
Jongse Park
57
0
0
24 Mar 2025
Ditto: Accelerating Diffusion Model via Temporal Value Similarity
Sungbin Kim
Hyunwuk Lee
Wonho Cho
Mincheol Park
Won Woo Ro
58
1
0
20 Jan 2025
LUTMUL: Exceed Conventional FPGA Roofline Limit by LUT-based Efficient Multiplication for Neural Network Inference
Yanyue Xie
Zhengang Li
Dana Diaconu
Suranga Handagala
M. Leeser
Xue Lin
69
0
0
01 Nov 2024
Accelerating PoT Quantization on Edge Devices
Rappy Saha
Jude Haris
José Cano
MQ
21
0
0
30 Sep 2024
Bespoke Approximation of Multiplication-Accumulation and Activation Targeting Printed Multilayer Perceptrons
Florentia Afentaki
Gurol Saglam
Argyris Kokkinis
K. Siozios
Georgios Zervakis
M. Tahoori
14
7
0
29 Dec 2023
A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge
Longwei Huang
Chao Fang
Qiong Li
Jun Lin
Zhongfeng Wang
92
10
0
15 Sep 2023
A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking
Lorenzo Papa
Paolo Russo
Irene Amerini
Luping Zhou
33
43
0
05 Sep 2023
BOMP-NAS: Bayesian Optimization Mixed Precision NAS
David van Son
F. D. Putter
Sebastian Vogel
Henk Corporaal
MQ
27
3
0
27 Jan 2023
Tailor: Altering Skip Connections for Resource-Efficient Inference
Olivia Weng
Gabriel Marcano
Vladimir Loncar
Alireza Khodamoradi
Nojan Sheybani
Andres Meza
F. Koushanfar
K. Denolf
Javier Mauricio Duarte
Ryan Kastner
46
11
0
18 Jan 2023
FullPack: Full Vector Utilization for Sub-Byte Quantized Inference on General Purpose CPUs
Hossein Katebi
Navidreza Asadi
M. Goudarzi
MQ
27
0
0
13 Nov 2022
Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization
Zechao Li
Mengshu Sun
Alec Lu
Haoyu Ma
Geng Yuan
...
Yanyu Li
M. Leeser
Zhangyang Wang
Xue Lin
Zhenman Fang
ViT
MQ
22
50
0
10 Aug 2022
Quantum Neural Network Compression
Zhirui Hu
Peiyan Dong
Zhepeng Wang
Youzuo Lin
Yanzhi Wang
Weiwen Jiang
GNN
27
28
0
04 Jul 2022
Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark
H. Borras
G. D. Guglielmo
Javier Mauricio Duarte
Nicolò Ghielmetti
B. Hawks
...
Nhan Tran
Yaman Umuroglu
Olivia Weng
Aidan Yokuda
Michaela Blott
VLM
MQ
32
14
0
23 Jun 2022
Real-time Hyper-Dimensional Reconfiguration at the Edge using Hardware Accelerators
Indhumathi Kandaswamy
Saurabh Farkya
Z. Daniels
G. V. D. Wal
Aswin Raghavan
...
Jun Hu
M. Lomnitz
M. Isnardi
David C. Zhang
M. Piacentino
BDL
9
3
0
10 Jun 2022
Real-Time Portrait Stylization on the Edge
Yanyu Li
Xuan Shen
Geng Yuan
Jiexiong Guan
Wei Niu
Hao Tang
Bin Ren
Yanzhi Wang
35
0
0
02 Jun 2022
SPViT: Enabling Faster Vision Transformers via Soft Token Pruning
Zhenglun Kong
Peiyan Dong
Xiaolong Ma
Xin Meng
Mengshu Sun
...
Geng Yuan
Bin Ren
Minghai Qin
H. Tang
Yanzhi Wang
ViT
34
144
0
27 Dec 2021
N3H-Core: Neuron-designed Neural Network Accelerator via FPGA-based Heterogeneous Computing Cores
Yu Gong
Zhihang Xu
Zhezhi He
Weifeng Zhang
Xiaobing Tu
Xiaoyao Liang
Li Jiang
25
13
0
15 Dec 2021
Neural Network Quantization for Efficient Inference: A Survey
Olivia Weng
MQ
28
23
0
08 Dec 2021
ILMPQ : An Intra-Layer Multi-Precision Deep Neural Network Quantization framework for FPGA
Sung-En Chang
Yanyu Li
Mengshu Sun
Yanzhi Wang
Xue Lin
MQ
6
1
0
30 Oct 2021
RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions
Sung-En Chang
Yanyu Li
Mengshu Sun
Weiwen Jiang
Sijia Liu
Yanzhi Wang
Xue Lin
MQ
11
10
0
30 Oct 2021
A Deep Learning Inference Scheme Based on Pipelined Matrix Multiplication Acceleration Design and Non-uniform Quantization
Yuyang Zhang
Dik Hin Leung
Min Guo
Yijia Xiao
Haoyue Liu
Yunfei Li
Jiyuan Zhang
Guan Wang
Zhen Chen
11
2
0
10 Oct 2021
Hardware-assisted Trusted Memory Disaggregation for Secure Far Memory
Taekyung Heo
Seung-Hyun Kang
Sanghyeon Lee
Soojin Hwang
Jaehyuk Huh
18
1
0
25 Aug 2021
hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices
F. Fahim
B. Hawks
C. Herwig
J. Hirschauer
S. Jindariani
...
J. Ngadiuba
Miaoyuan Liu
Duc Hoang
E. Kreinar
Zhenbin Wu
30
129
0
09 Mar 2021
Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference
B. Hawks
Javier Mauricio Duarte
Nicholas J. Fraser
Alessandro Pappalardo
N. Tran
Yaman Umuroglu
MQ
8
51
0
22 Feb 2021
Accelerating convolutional neural network by exploiting sparsity on GPUs
Weizhi Xu
Yintai Sun
Shengyu Fan
Hui Yu
Xin Fu
22
7
0
22 Sep 2019
A Survey on Deep Learning in Medical Image Analysis
G. Litjens
Thijs Kooi
B. Bejnordi
A. Setio
F. Ciompi
Mohsen Ghafoorian
Jeroen van der Laak
Bram van Ginneken
C. I. Sánchez
OOD
310
10,621
0
19 Feb 2017
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
Aojun Zhou
Anbang Yao
Yiwen Guo
Lin Xu
Yurong Chen
MQ
337
1,049
0
10 Feb 2017
1