Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.15516
Cited By
Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models
22 July 2024
Georgy Tyukin
G. Dovonon
Jean Kaddour
Pasquale Minervini
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models"
7 / 7 papers shown
Title
Layer-Condensed KV Cache for Efficient Inference of Large Language Models
Haoyi Wu
Kewei Tu
MQ
53
19
0
17 May 2024
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Piotr Nawrot
Adrian Lañcucki
Marcin Chochowski
David Tarjan
Edoardo Ponti
46
51
0
14 Mar 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Chak Tou Leong
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
72
115
0
15 Jan 2024
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
149
581
0
22 May 2023
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
64
376
0
05 Mar 2021
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Minjia Zhang
Yuxiong He
AI4CE
24
101
0
26 Oct 2020
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
76
586
0
25 Sep 2019
1