Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.07140
Cited By
Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization
12 May 2024
Xinyuan Zhang
Jiang Liu
Zehui Xiong
Yudong Huang
Gaochang Xie
Ran Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization"
3 / 3 papers shown
Title
PartialLoading: User Scheduling and Bandwidth Allocation for Parameter-sharing Edge Inference
Guanqiao Qu
Qian Chen
Xianhao Chen
Kaibin Huang
Yuguang Fang
46
1
0
29 Mar 2025
On-Device Language Models: A Comprehensive Review
Jiajun Xu
Zhiyuan Li
Wei Chen
Qun Wang
Xin Gao
Qi Cai
Ziyuan Ling
50
27
0
26 Aug 2024
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
371
0
13 Mar 2023
1