ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.15516
  4. Cited By
Attention Is All You Need But You Don't Need All Of It For Inference of
  Large Language Models

Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models

22 July 2024
Georgy Tyukin
G. Dovonon
Jean Kaddour
Pasquale Minervini
    LRM
ArXivPDFHTML

Papers citing "Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models"

7 / 7 papers shown
Title
Layer-Condensed KV Cache for Efficient Inference of Large Language
  Models
Layer-Condensed KV Cache for Efficient Inference of Large Language Models
Haoyi Wu
Kewei Tu
MQ
53
19
0
17 May 2024
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Piotr Nawrot
Adrian Lañcucki
Marcin Chochowski
David Tarjan
Edoardo Ponti
46
51
0
14 Mar 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive
  Survey of Speculative Decoding
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Chak Tou Leong
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
72
115
0
15 Jan 2024
RWKV: Reinventing RNNs for the Transformer Era
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
149
581
0
22 May 2023
Attention is Not All You Need: Pure Attention Loses Rank Doubly
  Exponentially with Depth
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
64
376
0
05 Mar 2021
Accelerating Training of Transformer-Based Language Models with
  Progressive Layer Dropping
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Minjia Zhang
Yuxiong He
AI4CE
24
101
0
26 Oct 2020
Reducing Transformer Depth on Demand with Structured Dropout
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
76
586
0
25 Sep 2019
1