Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.14636
Cited By
PerLLM: Personalized Inference Scheduling with Edge-Cloud Collaboration for Diverse LLM Services
23 May 2024
Zheming Yang
Yuanhao Yang
Chang Zhao
Qi Guo
Wenkai He
Wen Ji
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PerLLM: Personalized Inference Scheduling with Edge-Cloud Collaboration for Diverse LLM Services"
8 / 8 papers shown
Title
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Baoxia Du
H. Du
Dusit Niyato
Ruidong Li
58
0
0
05 May 2025
Taming the Titans: A Survey of Efficient LLM Inference Serving
Ranran Zhen
J. Li
Yixin Ji
Zhengyuan Yang
Tong Liu
Qingrong Xia
Xinyu Duan
Zehao Wang
Baoxing Huai
Hao Fei
LLMAG
77
0
0
28 Apr 2025
DeServe: Towards Affordable Offline LLM Inference via Decentralization
Linyu Wu
Xiaoyuan Liu
Tianneng Shi
Zhe Ye
D. Song
OffRL
42
0
0
28 Jan 2025
SPA: Towards A Computational Friendly Cloud-Base and On-Devices Collaboration Seq2seq Personalized Generation
Yanming Liu
Xinyue Peng
Jiannan Cao
Le Dai
Xingzu Liu
Mingbang Wang
Weihao Liu
SyDa
44
2
0
11 Mar 2024
A Survey on Effective Invocation Methods of Massive LLM Services
Can Wang
Bolin Zhang
Dianbo Sui
Zhiying Tu
Xiaoyu Liu
Jiabao Kang
34
6
0
05 Feb 2024
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang
Hongye Jin
Ruixiang Tang
Xiaotian Han
Qizhang Feng
Haoming Jiang
Bing Yin
Xia Hu
LM&MA
137
626
0
26 Apr 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
369
0
13 Mar 2023
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
289
1,524
0
27 Feb 2021
1