ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.14794
  4. Cited By
Rethinking Attention with Performers

Rethinking Attention with Performers

30 September 2020
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
Tamás Sarlós
Peter Hawkins
Jared Davis
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
ArXivPDFHTML

Papers citing "Rethinking Attention with Performers"

50 / 1,014 papers shown
Title
MaskVD: Region Masking for Efficient Video Object Detection
MaskVD: Region Masking for Efficient Video Object Detection
Sreetama Sarkar
Gourav Datta
Souvik Kundu
Kai Zheng
Chirayata Bhattacharyya
P. Beerel
27
3
0
16 Jul 2024
Omni-Dimensional Frequency Learner for General Time Series Analysis
Omni-Dimensional Frequency Learner for General Time Series Analysis
Xianing Chen.Hanting Chen
Hanting Chen
Hailin Hu
AI4TS
37
0
0
15 Jul 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and
  Low-precision
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Jay Shah
Ganesh Bikshandi
Ying Zhang
Vijay Thakkar
Pradeep Ramani
Tri Dao
53
113
0
11 Jul 2024
When big data actually are low-rank, or entrywise approximation of certain function-generated matrices
When big data actually are low-rank, or entrywise approximation of certain function-generated matrices
Stanislav Budzinskiy
64
2
0
03 Jul 2024
Let the Code LLM Edit Itself When You Edit the Code
Let the Code LLM Edit Itself When You Edit the Code
Zhenyu He
Jun Zhang
Shengjie Luo
Jingjing Xu
Z. Zhang
Di He
KELM
36
0
0
03 Jul 2024
On the Anatomy of Attention
On the Anatomy of Attention
Nikhil Khatri
Tuomas Laakkonen
Jonathon Liu
Vincent Wang-Ma'scianica
3DV
48
1
0
02 Jul 2024
PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful
  Navigators
PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators
Kuo-Hao Zeng
Zichen Zhang
Kiana Ehsani
Rose Hendrix
Jordi Salvador
Alvaro Herrasti
Ross Girshick
Aniruddha Kembhavi
Luca Weihs
LM&Ro
OffRL
38
17
0
28 Jun 2024
InfiniGen: Efficient Generative Inference of Large Language Models with
  Dynamic KV Cache Management
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management
Wonbeom Lee
Jungi Lee
Junghwan Seo
Jaewoong Sim
RALM
26
74
0
28 Jun 2024
All Random Features Representations are Equivalent
All Random Features Representations are Equivalent
Luke Sernau
Silvano Bonacina
Rif A. Saurous
24
0
0
27 Jun 2024
UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs
UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs
Wenhao Li
Mingbao Lin
Yunshan Zhong
Shuicheng Yan
Rongrong Ji
38
0
0
26 Jun 2024
Learning Neural Networks with Sparse Activations
Learning Neural Networks with Sparse Activations
Pranjal Awasthi
Nishanth Dikkala
Pritish Kamath
Raghu Meka
36
2
0
26 Jun 2024
Sparser is Faster and Less is More: Efficient Sparse Attention for
  Long-Range Transformers
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
ODL
35
18
0
24 Jun 2024
Scaling Laws for Linear Complexity Language Models
Scaling Laws for Linear Complexity Language Models
Xuyang Shen
Dong Li
Ruitao Leng
Zhen Qin
Weigao Sun
Yiran Zhong
LRM
33
6
0
24 Jun 2024
Fast Tree-Field Integrators: From Low Displacement Rank to Topological
  Transformers
Fast Tree-Field Integrators: From Low Displacement Rank to Topological Transformers
Krzysztof Choromanski
Arijit Sehanobish
Somnath Basu Roy Chowdhury
Han Lin
Avinava Dubey
Tamás Sarlós
Snigdha Chaturvedi
AI4CE
22
0
0
22 Jun 2024
MoA: Mixture of Sparse Attention for Automatic Large Language Model
  Compression
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
Tianyu Fu
Haofeng Huang
Xuefei Ning
Genghan Zhang
Boju Chen
...
Shiyao Li
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MQ
49
17
0
21 Jun 2024
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Assaf Ben-Kish
Itamar Zimerman
Shady Abu Hussein
Nadav Cohen
Amir Globerson
Lior Wolf
Raja Giryes
Mamba
77
13
0
20 Jun 2024
A Primal-Dual Framework for Transformers and Neural Networks
A Primal-Dual Framework for Transformers and Neural Networks
Tan M. Nguyen
Tam Nguyen
Nhat Ho
Andrea L. Bertozzi
Richard G. Baraniuk
Stanley J. Osher
ViT
29
13
0
19 Jun 2024
Elliptical Attention
Elliptical Attention
Stefan K. Nielsen
Laziz U. Abdullaev
R. Teo
Tan M. Nguyen
23
3
0
19 Jun 2024
Unveiling the Hidden Structure of Self-Attention via Kernel Principal
  Component Analysis
Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis
R. Teo
Tan M. Nguyen
45
4
0
19 Jun 2024
In-Context Former: Lightning-fast Compressing Context for Large Language
  Model
In-Context Former: Lightning-fast Compressing Context for Large Language Model
Xiangfeng Wang
Zaiyi Chen
Zheyong Xie
Tong Xu
Yongyi He
Enhong Chen
35
1
0
19 Jun 2024
SampleAttention: Near-Lossless Acceleration of Long Context LLM
  Inference with Adaptive Structured Sparse Attention
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
Qianchao Zhu
Jiangfei Duan
Chang Chen
Siran Liu
Xiuhong Li
...
Huanqi Cao
Xiao Chuanfu
Xingcheng Zhang
Dahua Lin
Chao Yang
30
15
0
17 Jun 2024
Hierarchical Compression of Text-Rich Graphs via Large Language Models
Hierarchical Compression of Text-Rich Graphs via Large Language Models
Shichang Zhang
Da Zheng
Jiani Zhang
Qi Zhu
Xiang Song
Soji Adeshina
Christos Faloutsos
George Karypis
Yizhou Sun
VLM
28
1
0
13 Jun 2024
Short-Long Convolutions Help Hardware-Efficient Linear Attention to
  Focus on Long Sequences
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences
Zicheng Liu
Siyuan Li
Li Wang
Zedong Wang
Yunfan Liu
Stan Z. Li
35
7
0
12 Jun 2024
Autoregressive Pretraining with Mamba in Vision
Autoregressive Pretraining with Mamba in Vision
Sucheng Ren
Xianhang Li
Haoqin Tu
Feng Wang
Fangxun Shu
...
L. Yang
Peng Wang
Heng Wang
Alan Yuille
Cihang Xie
Mamba
70
9
0
11 Jun 2024
ReduceFormer: Attention with Tensor Reduction by Summation
ReduceFormer: Attention with Tensor Reduction by Summation
John Yang
Le An
Su Inn Park
31
0
0
11 Jun 2024
When Linear Attention Meets Autoregressive Decoding: Towards More
  Effective and Efficient Linearized Large Language Models
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You
Yichao Fu
Zheng Wang
Amir Yazdanbakhsh
Yingyan Celine Lin
33
2
0
11 Jun 2024
AIM: Let Any Multi-modal Large Language Models Embrace Efficient
  In-Context Learning
AIM: Let Any Multi-modal Large Language Models Embrace Efficient In-Context Learning
Jun Gao
Qian Qiao
Ziqiang Cao
Zili Wang
Wenjie Li
31
3
0
11 Jun 2024
What Can We Learn from State Space Models for Machine Learning on
  Graphs?
What Can We Learn from State Space Models for Machine Learning on Graphs?
Yinan Huang
Siqi Miao
Pan Li
44
7
0
09 Jun 2024
LoCoCo: Dropping In Convolutions for Long Context Compression
LoCoCo: Dropping In Convolutions for Long Context Compression
Ruisi Cai
Yuandong Tian
Zhangyang Wang
Beidi Chen
41
9
0
08 Jun 2024
Enhancing In-Context Learning Performance with just SVD-Based Weight
  Pruning: A Theoretical Perspective
Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective
Xinhao Yao
Xiaolin Hu
Shenzhi Yang
Yong Liu
47
2
0
06 Jun 2024
Combining Graph Neural Network and Mamba to Capture Local and Global
  Tissue Spatial Relationships in Whole Slide Images
Combining Graph Neural Network and Mamba to Capture Local and Global Tissue Spatial Relationships in Whole Slide Images
Ruiwen Ding
Kha-Dinh Luong
Erika Rodriguez
Ana Cristina Araujo Lemos da Silva
William Hsu
Mamba
48
2
0
05 Jun 2024
Exact Conversion of In-Context Learning to Model Weights in
  Linearized-Attention Transformers
Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers
Brian K Chen
Tianyang Hu
Hui Jin
Hwee Kuan Lee
Kenji Kawaguchi
50
0
0
05 Jun 2024
Compute-Efficient Medical Image Classification with Softmax-Free
  Transformers and Sequence Normalization
Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization
Firas Khader
Omar S. M. El Nahhas
T. Han
Gustav Muller-Franzes
S. Nebelung
Jakob Nikolas Kather
Daniel Truhn
MedIm
30
0
0
03 Jun 2024
You Only Scan Once: Efficient Multi-dimension Sequential Modeling with
  LightNet
You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet
Zhen Qin
Yuxin Mao
Xuyang Shen
Dong Li
Jing Zhang
Yuchao Dai
Yiran Zhong
58
1
0
31 May 2024
Rough Transformers: Lightweight and Continuous Time Series Modelling through Signature Patching
Rough Transformers: Lightweight and Continuous Time Series Modelling through Signature Patching
Fernando Moreno-Pino
Alvaro Arroyo
H. Waldon
Xiaowen Dong
Álvaro Cartea
AI4TS
34
1
0
31 May 2024
P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image
  Segmentation
P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image Segmentation
Qi Zhang
Guohua Geng
Long-He Yan
Pengbo Zhou
Zhaodi Li
Kang Li
Qinglin Liu
DiffM
34
1
0
30 May 2024
SFANet: Spatial-Frequency Attention Network for Weather Forecasting
SFANet: Spatial-Frequency Attention Network for Weather Forecasting
Jiaze Wang
Hao Chen
Hongcan Xu
Jinpeng Li
Bo-Lan Wang
Kun Shao
Furui Liu
Huaxi Chen
Guangyong Chen
Pheng-Ann Heng
64
0
0
29 May 2024
Learning to Continually Learn with the Bayesian Principle
Learning to Continually Learn with the Bayesian Principle
Soochan Lee
Hyeonseong Jeon
Jaehyeon Son
Gunhee Kim
BDL
CLL
37
3
0
29 May 2024
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Lianghui Zhu
Zilong Huang
Bencheng Liao
Jun Hao Liew
Hanshu Yan
Jiashi Feng
Xinggang Wang
70
13
0
28 May 2024
ViG: Linear-complexity Visual Sequence Learning with Gated Linear
  Attention
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao
Xinggang Wang
Lianghui Zhu
Qian Zhang
Chang Huang
57
4
0
28 May 2024
SMR: State Memory Replay for Long Sequence Modeling
SMR: State Memory Replay for Long Sequence Modeling
Biqing Qi
Junqi Gao
Kaiyan Zhang
Dong Li
Jianxing Liu
Ligang Wu
Bowen Zhou
33
5
0
27 May 2024
Unlocking the Secrets of Linear Complexity Sequence Model from A Unified
  Perspective
Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective
Zhen Qin
Xuyang Shen
Weigao Sun
Dong Li
Stanley T. Birchfield
Richard I. Hartley
Yiran Zhong
50
6
0
27 May 2024
Various Lengths, Constant Speed: Efficient Language Modeling with
  Lightning Attention
Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
Zhen Qin
Weigao Sun
Dong Li
Xuyang Shen
Weixuan Sun
Yiran Zhong
46
9
0
27 May 2024
SelfCP: Compressing Over-Limit Prompt via the Frozen Large Language
  Model Itself
SelfCP: Compressing Over-Limit Prompt via the Frozen Large Language Model Itself
Jun Gao
Ziqiang Cao
Wenjie Li
25
4
0
27 May 2024
Demystify Mamba in Vision: A Linear Attention Perspective
Demystify Mamba in Vision: A Linear Attention Perspective
Dongchen Han
Ziyi Wang
Zhuofan Xia
Yizeng Han
Yifan Pu
Chunjiang Ge
Jun Song
Shiji Song
Bo Zheng
Gao Huang
Mamba
34
49
0
26 May 2024
Variance-Reducing Couplings for Random Features: Perspectives from
  Optimal Transport
Variance-Reducing Couplings for Random Features: Perspectives from Optimal Transport
Isaac Reid
Stratis Markou
Krzysztof Choromanski
Richard E. Turner
Adrian Weller
27
1
0
26 May 2024
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
Jiayi Yao
Hanchen Li
Yuhan Liu
Siddhant Ray
Yihua Cheng
Qizheng Zhang
Kuntai Du
Shan Lu
Junchen Jiang
44
16
0
26 May 2024
Mixture of In-Context Prompters for Tabular PFNs
Mixture of In-Context Prompters for Tabular PFNs
Derek Xu
Olcay Cirit
Reza Asadi
Yizhou Sun
Wei Wang
31
9
0
25 May 2024
Activator: GLU Activation Function as the Core Component of a Vision
  Transformer
Activator: GLU Activation Function as the Core Component of a Vision Transformer
Abdullah Nazhat Abdullah
Tarkan Aydin
ViT
43
0
0
24 May 2024
Spectraformer: A Unified Random Feature Framework for Transformer
Spectraformer: A Unified Random Feature Framework for Transformer
Duke Nguyen
Aditya Joshi
Flora D. Salim
37
0
0
24 May 2024
Previous
12345...192021
Next