Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.16601
Cited By
An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs
28 June 2023
Haihao Shen
Hengyu Meng
Bo Dong
Zhe Wang
Ofir Zafrir
Yi Ding
Yunqian Luo
Hanwen Chang
Qun Gao
Zi. Wang
Guy Boudoukh
Moshe Wasserblat
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs"
2 / 2 papers shown
Title
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
41
48
0
15 Feb 2024
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
107
344
0
05 Jan 2021
1