ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.14794
  4. Cited By
Rethinking Attention with Performers

Rethinking Attention with Performers

30 September 2020
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
Tamás Sarlós
Peter Hawkins
Jared Davis
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
ArXivPDFHTML

Papers citing "Rethinking Attention with Performers"

50 / 1,014 papers shown
Title
Accelerating Error Correction Code Transformers
Accelerating Error Correction Code Transformers
Matan Levy
Yoni Choukroun
Lior Wolf
MQ
21
0
0
08 Oct 2024
LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy
  Attentions
LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy Attentions
R. Kannan
Chiranjib Bhattacharyya
Praneeth Kacham
David P. Woodruff
25
1
0
07 Oct 2024
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent
  Sparse Attention
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
Lijie Yang
Zhihao Zhang
Zhuofu Chen
Zikun Li
Zhihao Jia
45
4
0
07 Oct 2024
Accelerating Inference of Networks in the Frequency Domain
Accelerating Inference of Networks in the Frequency Domain
Chenqiu Zhao
Guanfang Dong
Anup Basu
35
0
0
06 Oct 2024
Fundamental Limitations on Subquadratic Alternatives to Transformers
Fundamental Limitations on Subquadratic Alternatives to Transformers
Josh Alman
Hantao Yu
23
1
0
05 Oct 2024
System 2 Reasoning Capabilities Are Nigh
System 2 Reasoning Capabilities Are Nigh
Scott C. Lowe
VLM
LRM
46
0
0
04 Oct 2024
Linear Transformer Topological Masking with Graph Random Features
Linear Transformer Topological Masking with Graph Random Features
Isaac Reid
Kumar Avinava Dubey
Deepali Jain
Will Whitney
Amr Ahmed
...
Connor Schenck
Richard E. Turner
René Wagner
Adrian Weller
Krzysztof Choromanski
24
1
0
04 Oct 2024
Can Mamba Always Enjoy the "Free Lunch"?
Can Mamba Always Enjoy the "Free Lunch"?
Ruifeng Ren
Zhicong Li
Yong Liu
44
1
0
04 Oct 2024
How to Train Long-Context Language Models (Effectively)
How to Train Long-Context Language Models (Effectively)
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
72
38
0
03 Oct 2024
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang
Jia wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLM
MQ
82
19
0
03 Oct 2024
Tuning Frequency Bias of State Space Models
Tuning Frequency Bias of State Space Models
Annan Yu
Dongwei Lyu
S. H. Lim
Michael W. Mahoney
N. Benjamin Erichson
44
3
0
02 Oct 2024
On The Adaptation of Unlimiformer for Decoder-Only Transformers
On The Adaptation of Unlimiformer for Decoder-Only Transformers
Kian Ahrabian
Alon Benhaim
Barun Patra
Jay Pujara
Saksham Singhal
Xia Song
38
0
0
02 Oct 2024
GLMHA A Guided Low-rank Multi-Head Self-Attention for Efficient Image
  Restoration and Spectral Reconstruction
GLMHA A Guided Low-rank Multi-Head Self-Attention for Efficient Image Restoration and Spectral Reconstruction
Zaid Ilyas
Naveed Akhtar
David Suter
Syed Zulqarnain Gilani
17
0
0
01 Oct 2024
Intelligent Fish Detection System with Similarity-Aware Transformer
Intelligent Fish Detection System with Similarity-Aware Transformer
Shengchen Li
Haobo Zuo
Changhong Fu
Zhiyong Wang
Zhiqiang Xu
ViT
23
0
0
28 Sep 2024
Cottention: Linear Transformers With Cosine Attention
Cottention: Linear Transformers With Cosine Attention
Gabriel Mongaras
Trevor Dohm
Eric C. Larson
26
0
0
27 Sep 2024
dnaGrinder: a lightweight and high-capacity genomic foundation model
dnaGrinder: a lightweight and high-capacity genomic foundation model
Qihang Zhao
Chi Zhang
Weixiong Zhang
26
0
0
24 Sep 2024
CSPS: A Communication-Efficient Sequence-Parallelism based Serving
  System for Transformer based Models with Long Prompts
CSPS: A Communication-Efficient Sequence-Parallelism based Serving System for Transformer based Models with Long Prompts
Zeyu Zhang
Haiying Shen
VLM
29
0
0
23 Sep 2024
DiffFluid: Plain Diffusion Models are Effective Predictors of Flow
  Dynamics
DiffFluid: Plain Diffusion Models are Effective Predictors of Flow Dynamics
Dongyu Luo
Jianyu Wu
Jing Wang
Hairun Xie
Xiangyu Yue
Shixiang Tang
DiffM
AI4CE
38
0
0
20 Sep 2024
Mamba-ST: State Space Model for Efficient Style Transfer
Mamba-ST: State Space Model for Efficient Style Transfer
Filippo Botti
Alex Ergasti
Leonardo Rossi
Tomaso Fontanini
Claudio Ferrari
Massimo Bertozzi
Andrea Prati
Mamba
53
2
0
16 Sep 2024
Mamba for Scalable and Efficient Personalized Recommendations
Mamba for Scalable and Efficient Personalized Recommendations
Andrew Starnes
Clayton Webster
Mamba
35
0
0
11 Sep 2024
Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language
  Models on a Single GPU
Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU
Zhenyu Ning
Jieru Zhao
Qihao Jin
Wenchao Ding
Minyi Guo
29
5
0
11 Sep 2024
DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models
DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models
Maryam Akhavan Aghdam
Hongpeng Jin
Yanzhao Wu
MoE
23
3
0
10 Sep 2024
Expanding Expressivity in Transformer Models with MöbiusAttention
Expanding Expressivity in Transformer Models with MöbiusAttention
Anna-Maria Halacheva
M. Nayyeri
Steffen Staab
27
1
0
08 Sep 2024
MVTN: A Multiscale Video Transformer Network for Hand Gesture
  Recognition
MVTN: A Multiscale Video Transformer Network for Hand Gesture Recognition
Mallika Garg
Debashis Ghosh
P. M. Pradhan
ViT
38
1
0
05 Sep 2024
Wavelet GPT: Wavelet Inspired Large Language Models
Wavelet GPT: Wavelet Inspired Large Language Models
Prateek Verma
AI4TS
20
0
0
04 Sep 2024
In Defense of RAG in the Era of Long-Context Language Models
In Defense of RAG in the Era of Long-Context Language Models
Tan Yu
Anbang Xu
Rama Akkiraju
RALM
3DV
26
24
0
03 Sep 2024
Autoregressive model path dependence near Ising criticality
Autoregressive model path dependence near Ising criticality
Yi Hong Teoh
R. Melko
AI4CE
33
1
0
28 Aug 2024
Squid: Long Context as a New Modality for Energy-Efficient On-Device
  Language Models
Squid: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Wei Chen
Zhiyuan Li
Shuo Xin
Yihao Wang
36
4
0
28 Aug 2024
SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking
  State Space Models
SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models
Shuaijie Shen
Chao Wang
Renzhuo Huang
Yan Zhong
Qinghai Guo
Zhichao Lu
Jianguo Zhang
Luziwei Leng
42
8
0
27 Aug 2024
Macformer: Transformer with Random Maclaurin Feature Attention
Macformer: Transformer with Random Maclaurin Feature Attention
Yuhan Guo
Lizhong Ding
Ye Yuan
Guoren Wang
51
0
0
21 Aug 2024
VrdONE: One-stage Video Visual Relation Detection
VrdONE: One-stage Video Visual Relation Detection
Xinjie Jiang
Chenxi Zheng
Xuemiao Xu
Bangzhen Liu
Weiying Zheng
Huaidong Zhang
Shengfeng He
VGen
VOS
47
3
0
18 Aug 2024
Linear Attention is Enough in Spatial-Temporal Forecasting
Linear Attention is Enough in Spatial-Temporal Forecasting
Xinyu Ning
AI4TS
40
0
0
17 Aug 2024
Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding
Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding
Lei Huang
Jiaming Guo
Guanhua He
Xishan Zhang
Rui Zhang
Shaohui Peng
Shaoli Liu
Tianshi Chen
26
2
0
16 Aug 2024
System States Forecasting of Microservices with Dynamic Spatio-Temporal
  Data
System States Forecasting of Microservices with Dynamic Spatio-Temporal Data
Yifei Xu
Jingguo Ge
Haina Tang
Shuai Ding
Tong Li
Hui Li
AI4TS
37
0
0
15 Aug 2024
Kraken: Inherently Parallel Transformers For Efficient Multi-Device
  Inference
Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference
R. Prabhakar
Hengrui Zhang
D. Wentzlaff
28
0
0
14 Aug 2024
Sampling Foundational Transformer: A Theoretical Perspective
Sampling Foundational Transformer: A Theoretical Perspective
Viet Anh Nguyen
Minh Lenhat
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong Son-Hy
44
0
0
11 Aug 2024
SAMSA: Efficient Transformer for Many Data Modalities
SAMSA: Efficient Transformer for Many Data Modalities
Minh Lenhat
Viet Anh Nguyen
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong Son-Hy
49
0
0
10 Aug 2024
GlitchProber: Advancing Effective Detection and Mitigation of Glitch
  Tokens in Large Language Models
GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models
Zhibo Zhang
Wuxia Bai
Yuxi Li
Max Q.-H. Meng
Kaidi Wang
Ling Shi
Li Li
Jun Wang
Haoyu Wang
24
4
0
09 Aug 2024
Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency,
  Performance, and Adversarial Robustness
Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness
Xiaojing Fan
Chunliang Tao
AAML
39
28
0
08 Aug 2024
Cross-layer Attention Sharing for Large Language Models
Cross-layer Attention Sharing for Large Language Models
Yongyu Mu
Yuzhang Wu
Yuchun Fan
Chenglong Wang
Hengyu Li
Qiaozhi He
Murun Yang
Tong Xiao
Jingbo Zhu
42
5
0
04 Aug 2024
Emotion-driven Piano Music Generation via Two-stage Disentanglement and
  Functional Representation
Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation
Jingyue Huang
Ke Chen
Yi-Hsuan Yang
CoGe
36
3
0
30 Jul 2024
Emotion-Driven Melody Harmonization via Melodic Variation and Functional
  Representation
Emotion-Driven Melody Harmonization via Melodic Variation and Functional Representation
Jingyue Huang
Yi-Hsuan Yang
42
2
0
29 Jul 2024
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Gagan Jain
Nidhi Hegde
Aditya Kusupati
Arsha Nagrani
Shyamal Buch
Prateek Jain
Anurag Arnab
Sujoy Paul
MoE
48
7
0
29 Jul 2024
VSSD: Vision Mamba with Non-Causal State Space Duality
VSSD: Vision Mamba with Non-Causal State Space Duality
Yuheng Shi
Minjing Dong
Mingjia Li
Chang Xu
Mamba
33
5
0
26 Jul 2024
FMamba: Mamba based on Fast-attention for Multivariate Time-series
  Forecasting
FMamba: Mamba based on Fast-attention for Multivariate Time-series Forecasting
Shusen Ma
Yu Kang
Peng Bai
Yunan Zhao
Mamba
AI4TS
22
3
0
20 Jul 2024
PolyFormer: Scalable Node-wise Filters via Polynomial Graph Transformer
PolyFormer: Scalable Node-wise Filters via Polynomial Graph Transformer
Jiahong Ma
Mingguo He
Zhewei Wei
52
2
0
19 Jul 2024
Longhorn: State Space Models are Amortized Online Learners
Longhorn: State Space Models are Amortized Online Learners
Bo Liu
Rui Wang
Lemeng Wu
Yihao Feng
Peter Stone
Qian Liu
51
10
0
19 Jul 2024
TorchGT: A Holistic System for Large-scale Graph Transformer Training
TorchGT: A Holistic System for Large-scale Graph Transformer Training
Mengdie Zhang
Jie Sun
Qi Hu
Peng Sun
Zeke Wang
Yonggang Wen
Tianwei Zhang
GNN
39
2
0
19 Jul 2024
Attention in SRAM on Tenstorrent Grayskull
Attention in SRAM on Tenstorrent Grayskull
Moritz Thüning
30
3
0
18 Jul 2024
Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of
  Learnable Binary Vectors
Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors
Matt Gorbett
Hossein Shirazi
Indrakshi Ray
MQ
43
0
0
16 Jul 2024
Previous
123456...192021
Next