Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.16758
Cited By
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters
24 June 2024
Euiin Yi
Taehyeon Kim
Hongseok Jeung
Du-Seong Chang
Se-Young Yun
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters"
13 / 13 papers shown
Title
Mixture of Attentions For Speculative Decoding
Matthieu Zimmer
Milan Gritta
Gerasimos Lampouras
Haitham Bou Ammar
Jun Wang
150
6
0
04 Oct 2024
Better & Faster Large Language Models via Multi-token Prediction
Fabian Gloeckle
Badr Youbi Idrissi
Baptiste Rozière
David Lopez-Paz
Gabriele Synnaeve
101
121
0
30 Apr 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
144
165
0
26 Jan 2024
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai
Yuhong Li
Zhengyang Geng
Hongwu Peng
Jason D. Lee
De-huai Chen
Tri Dao
174
314
0
19 Jan 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Chak Tou Leong
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
147
129
0
15 Jan 2024
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
469
4,444
0
09 Jun 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.5K
14,761
0
15 Mar 2023
Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
151
736
0
30 Nov 2022
JParaCrawl v3.0: A Large-scale English-Japanese Parallel Corpus
Makoto Morishita
Katsuki Chousa
Jun Suzuki
Masaaki Nagata
49
28
0
25 Feb 2022
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
367
4,598
0
27 Oct 2021
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
161
477
0
06 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
515
20,376
0
23 Oct 2019
Gaussian Error Linear Units (GELUs)
Dan Hendrycks
Kevin Gimpel
180
5,056
0
27 Jun 2016
1