Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.16283
Cited By
Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services
25 April 2024
Jiachen Liu
Zhiyu Wu
Jae-Won Chung
Fan Lai
Myungjin Lee
Mosharaf Chowdhury
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services"
9 / 9 papers shown
Title
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
Jae-Won Chung
Jiachen Liu
Jeff J. Ma
Ruofan Wu
Oh Jun Kweon
Yuxuan Xia
Zhiyu Wu
Mosharaf Chowdhury
31
0
0
09 May 2025
Streaming, Fast and Slow: Cognitive Load-Aware Streaming for Efficient LLM Serving
Chang Xiao
Brenda Z. Yang
29
0
0
25 Apr 2025
Tempo: Application-aware LLM Serving with Mixed SLO Requirements
Wei Zhang
Zhiyu Wu
Yi Mu
Banruo Liu
Myungjin Lee
Fan Lai
58
0
0
24 Apr 2025
Circinus: Efficient Query Planner for Compound ML Serving
Banruo Liu
Wei-Yu Lin
Minghao Fang
Yihan Jiang
Fan Lai
LRM
39
0
0
23 Apr 2025
GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments
Yanyu Chen
Ganhong Huang
108
0
0
28 Jan 2025
AdaServe: Accelerating Multi-SLO LLM Serving with SLO-Customized Speculative Decoding
Zikun Li
Zhuofu Chen
Remi Delacourt
Gabriele Oliaro
Zeyu Wang
...
Zhuoming Chen
Sean Lai
Xinhao Cheng
Xupeng Miao
Zhihao Jia
53
6
0
21 Jan 2025
HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location
Ting Sun
Penghan Wang
Fan Lai
166
1
0
15 Jan 2025
Efficient LLM Scheduling by Learning to Rank
Yichao Fu
Siqi Zhu
Runlong Su
Aurick Qiao
Ion Stoica
Hao Zhang
58
19
0
28 Aug 2024
The Falcon Series of Open Language Models
Ebtesam Almazrouei
Hamza Alobeidli
Abdulaziz Alshamsi
Alessandro Cappelli
Ruxandra-Aimée Cojocaru
...
Quentin Malartic
Daniele Mazzotta
Badreddine Noune
B. Pannier
Guilherme Penedo
AI4TS
ALM
121
404
0
28 Nov 2023
1