Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.06709
Cited By
CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers
10 April 2024
Longwei Zou
Qingyang Wang
Han Zhao
Jiangang Kong
Yi Yang
Yangdong Deng
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers"
1 / 1 papers shown
Title
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Yixin Song
Zeyu Mi
Haotong Xie
Haibo Chen
BDL
125
120
0
16 Dec 2023
1