Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.11750
Cited By
Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations
23 February 2023
Yujeong Choi
John Kim
Minsoo Rhu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations"
3 / 3 papers shown
Title
Splitwise: Efficient generative LLM inference using phase splitting
Pratyush Patel
Esha Choukse
Chaojie Zhang
Aashaka Shah
Íñigo Goiri
Saeed Maleki
Ricardo Bianchini
49
197
0
30 Nov 2023
Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems
Weijie Zhao
Deping Xie
Ronglai Jia
Yulei Qian
Rui Ding
Mingming Sun
P. Li
MoE
59
150
0
12 Mar 2020
RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing
Liu Ke
Udit Gupta
Carole-Jean Wu
B. Cho
Mark Hempstead
...
Dheevatsa Mudigere
Maxim Naumov
Martin D. Schatz
M. Smelyanskiy
Xiaodong Wang
46
213
0
30 Dec 2019
1