ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.14527
  4. Cited By
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU
  Heterogeneity

Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity

22 April 2024
Tyler Griggs
Xiaoxuan Liu
Jiaxiang Yu
Doyoung Kim
Wei-Lin Chiang
Alvin Cheung
Ion Stoica
ArXivPDFHTML

Papers citing "Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity"

10 / 10 papers shown
Title
The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks
The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks
Zhonghao Lyu
Ming Xiao
Jie Xu
Mikael Skoglund
Marco Di Renzo
28
0
0
14 May 2025
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
Shan Yu
Jiarong Xing
Yifan Qiao
Mingyuan Ma
Y. Li
...
Shiyi Cao
Ke Bao
Ion Stoica
Harry Xu
Ying Sheng
31
0
0
06 May 2025
Taming the Titans: A Survey of Efficient LLM Inference Serving
Taming the Titans: A Survey of Efficient LLM Inference Serving
Ranran Zhen
J. Li
Yixin Ji
Zhengyuan Yang
Tong Liu
Qingrong Xia
Xinyu Duan
Zehao Wang
Baoxing Huai
M. Zhang
LLMAG
77
0
0
28 Apr 2025
Efficient Algorithms for Verifying Kruskal Rank in Sparse Linear Regression and Related Applications
Fengqin Zhou
58
3
0
06 Mar 2025
iServe: An Intent-based Serving System for LLMs
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
179
0
0
08 Jan 2025
Software Performance Engineering for Foundation Model-Powered Software
  (FMware)
Software Performance Engineering for Foundation Model-Powered Software (FMware)
Haoxiang Zhang
Shi Chang
Arthur Leung
Kishanthan Thangarajah
Boyuan Chen
Hanan Lutfiyya
Ahmed E. Hassan
101
1
0
14 Nov 2024
A Strategy to Combine 1stGen Transformers and Open LLMs for Automatic
  Text Classification
A Strategy to Combine 1stGen Transformers and Open LLMs for Automatic Text Classification
Claudio Andrade
Washington Cunha
Davi Reis
Adriana S. Pagano
Leonardo Rocha
Marcos André Gonçalves
31
3
0
19 Aug 2024
LLM Inference Serving: Survey of Recent Advances and Opportunities
LLM Inference Serving: Survey of Recent Advances and Opportunities
Baolin Li
Yankai Jiang
V. Gadepally
Devesh Tiwari
78
18
0
17 Jul 2024
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Ramya Prabhu
Ajay Nayak
Jayashree Mohan
Ramachandran Ramjee
Ashish Panwar
VLM
57
25
0
07 May 2024
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
285
2,015
0
28 Jul 2020
1