Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.05022
Cited By
STI: Turbocharge NLP Inference at the Edge via Elastic Pipelining
11 July 2022
Liwei Guo
Wonkyo Choe
F. Lin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"STI: Turbocharge NLP Inference at the Edge via Elastic Pipelining"
5 / 5 papers shown
Title
Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline
Zhiyuan Fang
Yuegui Huang
Zicong Hong
Yufeng Lyu
Wuhui Chen
Yue Yu
Fan Yu
Zibin Zheng
MoE
48
0
0
09 Feb 2025
Minimum Viable Device Drivers for ARM TrustZone
Liwei Guo
F. Lin
24
18
0
15 Oct 2021
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
142
221
0
31 Dec 2020
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition
Yangyang Shi
Yongqiang Wang
Chunyang Wu
Ching-Feng Yeh
Julian Chan
Frank Zhang
Duc Le
M. Seltzer
56
168
0
21 Oct 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
1