ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.06899
  4. Cited By
Memory-efficient Transformers via Top-$k$ Attention

Memory-efficient Transformers via Top-kkk Attention

13 June 2021
Ankit Gupta
Guy Dar
Shaya Goodman
David Ciprut
Jonathan Berant
    MQ
ArXivPDFHTML

Papers citing "Memory-efficient Transformers via Top-$k$ Attention"

31 / 31 papers shown
Title
Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding
Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding
Konstantin Berestizshevsky
Renzo Andri
Lukas Cavigelli
80
1
0
12 Feb 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles Ling
Boyu Wang
51
1
0
24 Jan 2025
Attention Entropy is a Key Factor: An Analysis of Parallel Context
  Encoding with Full-attention-based Pre-trained Language Models
Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models
Zhisong Zhang
Yan Wang
Xinting Huang
Tianqing Fang
H. Zhang
Chenlong Deng
Shuaiyi Li
Dong Yu
88
2
0
21 Dec 2024
$k$NN Attention Demystified: A Theoretical Exploration for Scalable
  Transformers
kkkNN Attention Demystified: A Theoretical Exploration for Scalable Transformers
Themistoklis Haris
39
0
0
06 Nov 2024
LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy
  Attentions
LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy Attentions
R. Kannan
Chiranjib Bhattacharyya
Praneeth Kacham
David P. Woodruff
25
1
0
07 Oct 2024
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Jian Chen
Vashisth Tiwari
Ranajoy Sadhukhan
Zhuoming Chen
Jinyuan Shi
Ian En-Hsu Yen
Ian En-Hsu Yen
Avner May
Tianqi Chen
Beidi Chen
LRM
39
22
0
20 Aug 2024
Pick of the Bunch: Detecting Infrared Small Targets Beyond Hit-Miss
  Trade-Offs via Selective Rank-Aware Attention
Pick of the Bunch: Detecting Infrared Small Targets Beyond Hit-Miss Trade-Offs via Selective Rank-Aware Attention
Yimian Dai
Peiwen Pan
Yulei Qian
Yuxuan Li
Xiang Li
Jian Yang
Huan Wan
31
8
0
07 Aug 2024
Sparser is Faster and Less is More: Efficient Sparse Attention for
  Long-Range Transformers
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
ODL
35
19
0
24 Jun 2024
Loki: Low-Rank Keys for Efficient Sparse Attention
Loki: Low-Rank Keys for Efficient Sparse Attention
Prajwal Singhania
Siddharth Singh
Shwai He
S. Feizi
A. Bhatele
40
13
0
04 Jun 2024
Extended Mind Transformers
Extended Mind Transformers
Phoebe Klett
Thomas Ahle
RALM
29
0
0
04 Jun 2024
MultiMax: Sparse and Multi-Modal Attention Learning
MultiMax: Sparse and Multi-Modal Attention Learning
Yuxuan Zhou
Mario Fritz
M. Keuper
42
1
0
03 Jun 2024
IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs
IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs
Yuzhen Mao
Martin Ester
Ke Li
32
6
0
05 May 2024
What makes Models Compositional? A Theoretical View: With Supplement
What makes Models Compositional? A Theoretical View: With Supplement
Parikshit Ram
Tim Klinger
Alexander G. Gray
CoGe
36
6
0
02 May 2024
LoMA: Lossless Compressed Memory Attention
LoMA: Lossless Compressed Memory Attention
Yumeng Wang
Zhenyang Xiao
16
3
0
16 Jan 2024
Towards Efficient Generative Large Language Model Serving: A Survey from
  Algorithms to Systems
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
67
76
0
23 Dec 2023
Masked Hard-Attention Transformers Recognize Exactly the Star-Free
  Languages
Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages
Andy Yang
David Chiang
Dana Angluin
30
15
0
21 Oct 2023
Consciousness-Inspired Spatio-Temporal Abstractions for Better
  Generalization in Reinforcement Learning
Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning
Mingde Zhao
Safa Alver
H. V. Seijen
Romain Laroche
Doina Precup
Yoshua Bengio
15
3
0
30 Sep 2023
Approximating ReLU on a Reduced Ring for Efficient MPC-based Private
  Inference
Approximating ReLU on a Reduced Ring for Efficient MPC-based Private Inference
Kiwan Maeng
G. E. Suh
30
2
0
09 Sep 2023
BiFormer: Vision Transformer with Bi-Level Routing Attention
BiFormer: Vision Transformer with Bi-Level Routing Attention
Lei Zhu
Xinjiang Wang
Zhanghan Ke
Wayne Zhang
Rynson W. H. Lau
134
483
0
15 Mar 2023
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in
  Transformers
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers
Zong-xiao Li
Chong You
Srinadh Bhojanapalli
Daliang Li
A. S. Rawat
...
Kenneth Q Ye
Felix Chern
Felix X. Yu
Ruiqi Guo
Surinder Kumar
MoE
27
87
0
12 Oct 2022
Pretraining the Vision Transformer using self-supervised methods for
  vision based Deep Reinforcement Learning
Pretraining the Vision Transformer using self-supervised methods for vision based Deep Reinforcement Learning
Manuel Goulão
Arlindo L. Oliveira
ViT
38
6
0
22 Sep 2022
Treeformer: Dense Gradient Trees for Efficient Attention Computation
Treeformer: Dense Gradient Trees for Efficient Attention Computation
Lovish Madaan
Srinadh Bhojanapalli
Himanshu Jain
Prateek Jain
27
6
0
18 Aug 2022
Diagonal State Spaces are as Effective as Structured State Spaces
Diagonal State Spaces are as Effective as Structured State Spaces
Ankit Gupta
Albert Gu
Jonathan Berant
57
291
0
27 Mar 2022
Memorizing Transformers
Memorizing Transformers
Yuhuai Wu
M. Rabe
DeLesley S. Hutchins
Christian Szegedy
RALM
30
173
0
16 Mar 2022
Universal Hopfield Networks: A General Framework for Single-Shot
  Associative Memory Models
Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models
Beren Millidge
Tommaso Salvatori
Yuhang Song
Thomas Lukasiewicz
Rafal Bogacz
VLM
24
52
0
09 Feb 2022
On Learning the Transformer Kernel
On Learning the Transformer Kernel
Sankalan Pal Chowdhury
Adamos Solomou
Kumar Avinava Dubey
Mrinmaya Sachan
ViT
52
14
0
15 Oct 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
281
179
0
17 Feb 2021
Decision Machines: An Extension of Decision Trees
Decision Machines: An Extension of Decision Trees
Jinxiong Zhang
OffRL
14
0
0
27 Jan 2021
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
285
2,017
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
252
580
0
12 Mar 2020
Language Models as Knowledge Bases?
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
419
2,588
0
03 Sep 2019
1