Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.17264
Cited By
v1
v2
v3 (latest)
Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
25 September 2024
A. Agrawal
Haoran Qiu
Junda Chen
Íñigo Goiri
Chaojie Zhang
Rayyan Shahid
Ramachandran Ramjee
Alexey Tumanov
Esha Choukse
RALM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations"
7 / 7 papers shown
Title
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Jian Chen
Vashisth Tiwari
Ranajoy Sadhukhan
Zhuoming Chen
Jinyuan Shi
Ian En-Hsu Yen
Ian En-Hsu Yen
Avner May
Tianqi Chen
Beidi Chen
LRM
112
31
0
20 Aug 2024
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
515
6,279
0
05 Apr 2022
Sequence Parallelism: Long Sequence Training from System Perspective
Shenggui Li
Fuzhao Xue
Chaitanya Baranwal
Yongbin Li
Yang You
70
100
0
26 May 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
284
2,500
0
20 Apr 2021
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
Zhuohan Li
Siyuan Zhuang
Shiyuan Guo
Danyang Zhuo
Hao Zhang
Basel Alomair
Ion Stoica
MoE
68
124
0
16 Feb 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
331
1,914
0
17 Sep 2019
Online normalizer calculation for softmax
Maxim Milakov
N. Gimelshein
85
94
0
08 May 2018
1