ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.08363
  4. Cited By
Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding

Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding

12 February 2025
Konstantin Berestizshevsky
Renzo Andri
Lukas Cavigelli
ArXivPDFHTML

Papers citing "Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding"

11 / 11 papers shown
Title
Beyond the Limits: A Survey of Techniques to Extend the Context Length
  in Large Language Models
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
Xindi Wang
Mahsa Salmani
Parsa Omidi
Xiangyu Ren
Mehdi Rezagholizadeh
A. Eshaghi
LRM
65
42
0
03 Feb 2024
Linear Log-Normal Attention with Unbiased Concentration
Linear Log-Normal Attention with Unbiased Concentration
Yury Nahshan
Dor-Joseph Kampeas
E. Haleva
42
8
0
22 Nov 2023
On The Computational Complexity of Self-Attention
On The Computational Complexity of Self-Attention
Feyza Duman Keles
Pruthuvi Maheshakya Wijewardena
Chinmay Hegde
92
122
0
11 Sep 2022
What Language Model Architecture and Pretraining Objective Work Best for
  Zero-Shot Generalization?
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
Thomas Wang
Adam Roberts
Daniel Hesslow
Teven Le Scao
Hyung Won Chung
Iz Beltagy
Julien Launay
Colin Raffel
81
172
0
12 Apr 2022
Learned Token Pruning for Transformers
Learned Token Pruning for Transformers
Sehoon Kim
Sheng Shen
D. Thorsley
A. Gholami
Woosuk Kwon
Joseph Hassoun
Kurt Keutzer
48
154
0
02 Jul 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
402
21,347
0
25 Mar 2021
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and
  Head Pruning
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Hanrui Wang
Zhekai Zhang
Song Han
99
388
0
17 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
530
40,739
0
22 Oct 2020
Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
128
4,048
0
10 Apr 2020
A$^3$: Accelerating Attention Mechanisms in Neural Networks with
  Approximation
A3^33: Accelerating Attention Mechanisms in Neural Networks with Approximation
Tae Jun Ham
Sungjun Jung
Seonghak Kim
Young H. Oh
Yeonhong Park
...
Jung-Hun Park
Sanghee Lee
Kyoung Park
Jae W. Lee
D. Jeong
50
218
0
22 Feb 2020
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
628
130,942
0
12 Jun 2017
1