ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.14794
  4. Cited By
Rethinking Attention with Performers

Rethinking Attention with Performers

30 September 2020
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
Tamás Sarlós
Peter Hawkins
Jared Davis
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
ArXivPDFHTML

Papers citing "Rethinking Attention with Performers"

50 / 1,014 papers shown
Title
Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain
Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain
Hyowon Wi
Jeongwhan Choi
Noseong Park
33
0
0
13 May 2025
Block-Biased Mamba for Long-Range Sequence Processing
Block-Biased Mamba for Long-Range Sequence Processing
Annan Yu
N. Benjamin Erichson
Mamba
40
0
0
13 May 2025
Tensor Sketch: Fast and Scalable Polynomial Kernel Approximation
Tensor Sketch: Fast and Scalable Polynomial Kernel Approximation
Ninh Pham
Rasmus Pagh
27
0
0
13 May 2025
Graph Laplacian Wavelet Transformer via Learnable Spectral Decomposition
Graph Laplacian Wavelet Transformer via Learnable Spectral Decomposition
Andrew Kiruluta
Eric Lundy
Priscilla Burity
29
0
0
09 May 2025
Image Recognition with Online Lightweight Vision Transformer: A Survey
Image Recognition with Online Lightweight Vision Transformer: A Survey
Zherui Zhang
Rongtao Xu
Jie Zhou
Changwei Wang
Xingtian Pei
...
Jiguang Zhang
Li Guo
Longxiang Gao
Wenyuan Xu
Shibiao Xu
ViT
148
0
0
06 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
99
1
0
01 May 2025
SFi-Former: Sparse Flow Induced Attention for Graph Transformer
SFi-Former: Sparse Flow Induced Attention for Graph Transformer
ZeLin Li
J. Q. Shi
Xinming Zhang
Miao Zhang
B. Li
44
0
0
29 Apr 2025
From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models
From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models
Andrew Kiruluta
24
0
0
29 Apr 2025
Embedding Empirical Distributions for Computing Optimal Transport Maps
Embedding Empirical Distributions for Computing Optimal Transport Maps
Mingchen Jiang
Peng Xu
Xichen Ye
Xiaohui Chen
Yun Yang
Yifan Chen
OT
56
0
0
24 Apr 2025
Bidirectional Mamba for Single-Cell Data: Efficient Context Learning with Biological Fidelity
Bidirectional Mamba for Single-Cell Data: Efficient Context Learning with Biological Fidelity
Cong Qi
Hanzhang Fang
Tianxing Hu
Siqi Jiang
Wei Zhi
Mamba
58
0
0
22 Apr 2025
ECViT: Efficient Convolutional Vision Transformer with Local-Attention and Multi-scale Stages
ECViT: Efficient Convolutional Vision Transformer with Local-Attention and Multi-scale Stages
Zhoujie Qian
ViT
29
0
0
21 Apr 2025
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation
Lvmin Zhang
Maneesh Agrawala
DiffM
VGen
75
0
0
17 Apr 2025
A Review of YOLOv12: Attention-Based Enhancements vs. Previous Versions
A Review of YOLOv12: Attention-Based Enhancements vs. Previous Versions
Rahima Khanam
Muhammad Hussain
36
0
0
16 Apr 2025
Extended Short- and Long-Range Mesh Learning for Fast and Generalized Garment Simulation
Extended Short- and Long-Range Mesh Learning for Fast and Generalized Garment Simulation
Aoran Liu
Kun Hu
Clinton Mo
C. Li
Zhiyong Wang
3DH
AI4CE
36
0
0
16 Apr 2025
Ordinary Least Squares as an Attention Mechanism
Ordinary Least Squares as an Attention Mechanism
Philippe Goulet Coulombe
26
0
0
13 Apr 2025
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
Yoshihiro Yamada
ViT
30
0
0
09 Apr 2025
Learnable Multi-Scale Wavelet Transformer: A Novel Alternative to Self-Attention
Learnable Multi-Scale Wavelet Transformer: A Novel Alternative to Self-Attention
Andrew Kiruluta
Priscilla Burity
Samantha Williams
27
3
0
08 Apr 2025
HRMedSeg: Unlocking High-resolution Medical Image segmentation via Memory-efficient Attention Modeling
HRMedSeg: Unlocking High-resolution Medical Image segmentation via Memory-efficient Attention Modeling
Qing Xu
Zhenye Lou
Chenxin Li
Xiangjian He
Rong Qu
Tesema Fiseha Berhanu
Yi Wang
Wenting Duan
Zhen Chen
MedIm
33
0
0
08 Apr 2025
Of All StrIPEs: Investigating Structure-informed Positional Encoding for Efficient Music Generation
Of All StrIPEs: Investigating Structure-informed Positional Encoding for Efficient Music Generation
Manvi Agarwal
Changhong Wang
Gaël Richard
29
0
0
07 Apr 2025
Decoding Recommendation Behaviors of In-Context Learning LLMs Through Gradient Descent
Decoding Recommendation Behaviors of In-Context Learning LLMs Through Gradient Descent
Yi Xu
Weicong Qin
Weijie Yu
Ming He
Jianping Fan
Jun Xu
26
0
0
06 Apr 2025
SQuat: Subspace-orthogonal KV Cache Quantization
SQuat: Subspace-orthogonal KV Cache Quantization
Hao Wang
Ligong Han
Kai Xu
Akash Srivastava
MQ
51
0
0
31 Mar 2025
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei
Rama Chellappa
40
0
0
30 Mar 2025
Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework
Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework
Soham Sane
MoE
64
0
0
26 Mar 2025
iFlame: Interleaving Full and Linear Attention for Efficient Mesh Generation
iFlame: Interleaving Full and Linear Attention for Efficient Mesh Generation
Hanxiao Wang
Biao Zhang
Weize Quan
Dong-ming Yan
Peter Wonka
51
0
0
20 Mar 2025
ACE: A Cardinality Estimator for Set-Valued Queries
ACE: A Cardinality Estimator for Set-Valued Queries
Yufan Sheng
Xin Cao
Kaiqi Zhao
Yixiang Fang
Jianzhong Qi
Wenjie Zhang
Christian S. Jensen
62
0
0
19 Mar 2025
CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences
CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences
Ziran Qin
Yuchen Cao
Mingbao Lin
Wen Hu
Shixuan Fan
Ke Cheng
Weiyao Lin
Jianguo Li
71
3
0
16 Mar 2025
Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs
Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs
Nir Ailon
Akhiad Bercovich
Omri Weinstein
54
0
0
15 Mar 2025
Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations
Sajad Movahedi
Felix Sarnthein
Nicola Muca Cirone
Antonio Orvieto
48
2
0
13 Mar 2025
Talk2PC: Enhancing 3D Visual Grounding through LiDAR and Radar Point Clouds Fusion for Autonomous Driving
Runwei Guan
Jianan Liu
Ningwei Ouyang
Daizong Liu
Xiaolou Sun
Lianqing Zheng
Ming Xu
Yutao Yue
Hui Xiong
63
1
0
11 Mar 2025
STEAD: Spatio-Temporal Efficient Anomaly Detection for Time and Compute Sensitive Applications
Andrew Gao
Jun Liu
AI4TS
58
0
0
11 Mar 2025
MIRAM: Masked Image Reconstruction Across Multiple Scales for Breast Lesion Risk Prediction
MIRAM: Masked Image Reconstruction Across Multiple Scales for Breast Lesion Risk Prediction
H. Q. Vo
Pengyu Yuan
Zheng Yin
Kelvin K. Wong
Chika F. Ezeana
S. Ly
Stephen T. C. Wong
H. Nguyen
46
0
0
10 Mar 2025
TokenButler: Token Importance is Predictable
Yash Akhauri
Ahmed F. AbouElhamayed
Yifei Gao
Chi-chih Chang
Nilesh Jain
Mohamed S. Abdelfattah
50
0
0
10 Mar 2025
Conformal Transformations for Symmetric Power Transformers
Conformal Transformations for Symmetric Power Transformers
Saurabh Kumar
Jacob Buckman
Carles Gelada
Sean Zhang
70
0
0
05 Mar 2025
Predicting Team Performance from Communications in Simulated Search-and-Rescue
Ali Jalal-Kamali
Nikolos Gurney
David Pynadath
AI4TS
116
0
0
05 Mar 2025
FourierNAT: A Fourier-Mixing-Based Non-Autoregressive Transformer for Parallel Sequence Generation
FourierNAT: A Fourier-Mixing-Based Non-Autoregressive Transformer for Parallel Sequence Generation
Andrew Kiruluta
Eric Lundy
Andreas Lemos
AI4TS
47
0
0
04 Mar 2025
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
Yujiao Yang
Jing Lian
Linhui Li
MoE
82
0
0
04 Mar 2025
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar
Gursimran Singh
Mohammad Akbari
Yong Zhang
VLM
77
0
0
04 Mar 2025
Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking
Jiawen Zhu
Huayi Tang
Xin Chen
Xinying Wang
Dong Wang
Huchuan Lu
50
2
0
01 Mar 2025
A HEART for the environment: Transformer-Based Spatiotemporal Modeling for Air Quality Prediction
A HEART for the environment: Transformer-Based Spatiotemporal Modeling for Air Quality Prediction
Norbert Bodendorfer
65
1
0
26 Feb 2025
Sliding Window Attention Training for Efficient Large Language Models
Sliding Window Attention Training for Efficient Large Language Models
Zichuan Fu
Wentao Song
Yixuan Wang
X. Wu
Yefeng Zheng
Yingying Zhang
Derong Xu
Xuetao Wei
Tong Xu
Xiangyu Zhao
81
1
0
26 Feb 2025
Neural Network Graph Similarity Computation Based on Graph Fusion
Neural Network Graph Similarity Computation Based on Graph Fusion
Zenghui Chang
Yiqiao Zhang
Hong Cai Chen
GNN
73
0
0
25 Feb 2025
Self-Adjust Softmax
Self-Adjust Softmax
Chuanyang Zheng
Yihang Gao
Guoxuan Chen
Han Shi
Jing Xiong
Xiaozhe Ren
Chao Huang
Xin Jiang
Z. Li
Yu-Hu Li
50
0
0
25 Feb 2025
The FFT Strikes Again: An Efficient Alternative to Self-Attention
The FFT Strikes Again: An Efficient Alternative to Self-Attention
Jacob Fein-Ashley
R. Kannan
Viktor Prasanna
68
2
0
25 Feb 2025
A Survey of Graph Transformers: Architectures, Theories and Applications
A Survey of Graph Transformers: Architectures, Theories and Applications
Chaohao Yuan
Kangfei Zhao
Ercan Engin Kuruoglu
Liang Wang
Tingyang Xu
Wenbing Huang
Deli Zhao
Hong Cheng
Yu Rong
57
4
0
23 Feb 2025
Compression Barriers for Autoregressive Transformers
Compression Barriers for Autoregressive Transformers
Themistoklis Haris
Krzysztof Onak
39
1
0
21 Feb 2025
MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
Sihyun Yu
Meera Hahn
Dan Kondratyuk
Jinwoo Shin
Agrim Gupta
José Lezama
Irfan Essa
David A. Ross
Jonathan Huang
DiffM
VGen
77
0
0
18 Feb 2025
Low-Rank Thinning
Low-Rank Thinning
Annabelle Michael Carrell
Albert Gong
Abhishek Shetty
Raaz Dwivedi
Lester W. Mackey
61
0
0
17 Feb 2025
scMamba: A Pre-Trained Model for Single-Nucleus RNA Sequencing Analysis in Neurodegenerative Disorders
scMamba: A Pre-Trained Model for Single-Nucleus RNA Sequencing Analysis in Neurodegenerative Disorders
Gyutaek Oh
B. Choi
Seyoung Jin
Inkyung Jung
J. C. Ye
Mamba
37
0
0
12 Feb 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
45
0
0
11 Feb 2025
Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
Connor Schenck
Isaac Reid
M. Jacob
Alex Bewley
Joshua Ainslie
...
Matthias Minderer
Dmitry Kalashnikov
Jonathan Tompson
Vikas Sindhwani
Krzysztof Choromanski
66
1
0
04 Feb 2025
1234...192021
Next