Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.15758
Cited By
Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens
24 February 2024
Huiping Zhuang
Jiahong Yu
Qianshi Pang
Zihao Wang
Huiping Zhuang
Cen Chen
Xiaofeng Zou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens"
9 / 9 papers shown
Title
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai
Yuhong Li
Zhengyang Geng
Hongwu Peng
Jason D. Lee
De-huai Chen
Tri Dao
108
295
0
19 Jan 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Chak Tou Leong
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
105
121
0
15 Jan 2024
PaSS: Parallel Speculative Sampling
Giovanni Monea
Armand Joulin
Edouard Grave
MoE
53
34
0
22 Nov 2023
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
219
1,792
0
28 Sep 2023
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Guosheng Dong
Zhiying Wu
ELM
LRM
153
743
0
19 Sep 2023
Accelerating LLM Inference with Staged Speculative Decoding
Benjamin Spector
Christal Re
63
107
0
08 Aug 2023
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding
Seongjun Yang
Gibbeum Lee
Jaewoong Cho
Dimitris Papailiopoulos
Kangwook Lee
66
36
0
12 Jul 2023
Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
100
696
0
30 Nov 2022
On The Computational Complexity of Self-Attention
Feyza Duman Keles
Pruthuvi Maheshakya Wijewardena
Chinmay Hegde
92
122
0
11 Sep 2022
1