ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.14794
  4. Cited By
Rethinking Attention with Performers

Rethinking Attention with Performers

30 September 2020
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
Tamás Sarlós
Peter Hawkins
Jared Davis
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
ArXivPDFHTML

Papers citing "Rethinking Attention with Performers"

50 / 1,014 papers shown
Title
Transformers are Multi-State RNNs
Transformers are Multi-State RNNs
Matanel Oren
Michael Hassid
Nir Yarden
Yossi Adi
Roy Schwartz
OffRL
32
35
0
11 Jan 2024
Efficient Vision-and-Language Pre-training with Text-Relevant Image
  Patch Selection
Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
Wei Ye
Chaoya Jiang
Haiyang Xu
Chenhao Ye
Chenliang Li
Mingshi Yan
Shikun Zhang
Songhang Huang
Fei Huang
VLM
37
0
0
11 Jan 2024
Efficient Image Deblurring Networks based on Diffusion Models
Efficient Image Deblurring Networks based on Diffusion Models
Kang Chen
Yuanjie Liu
DiffM
16
2
0
11 Jan 2024
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence
  Lengths in Large Language Models
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Zhen Qin
Weigao Sun
Dong Li
Xuyang Shen
Weixuan Sun
Yiran Zhong
72
22
0
09 Jan 2024
SeTformer is What You Need for Vision and Language
SeTformer is What You Need for Vision and Language
Pourya Shamsolmoali
Masoumeh Zareapoor
Eric Granger
Michael Felsberg
43
4
0
07 Jan 2024
Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN
  Ticket
Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket
Zhaokun Zhou
Kaiwei Che
Wei Fang
Keyu Tian
Yuesheng Zhu
Shuicheng Yan
Yonghong Tian
Liuliang Yuan
ViT
41
28
0
04 Jan 2024
PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity
  Compensation
PanGu-πππ: Enhancing Language Model Architectures via Nonlinearity Compensation
Yunhe Wang
Hanting Chen
Yehui Tang
Tianyu Guo
Kai Han
...
Qinghua Xu
Qun Liu
Jun Yao
Chao Xu
Dacheng Tao
70
16
0
27 Dec 2023
Cached Transformers: Improving Transformers with Differentiable Memory
  Cache
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Zhaoyang Zhang
Wenqi Shao
Yixiao Ge
Xiaogang Wang
Liang Feng
Ping Luo
16
2
0
20 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation
  learning
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
29
1
0
18 Dec 2023
Linear Attention via Orthogonal Memory
Linear Attention via Orthogonal Memory
Jun Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
40
3
0
18 Dec 2023
Inducing Point Operator Transformer: A Flexible and Scalable
  Architecture for Solving PDEs
Inducing Point Operator Transformer: A Flexible and Scalable Architecture for Solving PDEs
Seungjun Lee
Taeil Oh
25
6
0
18 Dec 2023
scBiGNN: Bilevel Graph Representation Learning for Cell Type
  Classification from Single-cell RNA Sequencing Data
scBiGNN: Bilevel Graph Representation Learning for Cell Type Classification from Single-cell RNA Sequencing Data
Rui Yang
Wenrui Dai
Chenglin Li
Junni Zou
Dapeng Wu
Hongkai Xiong
35
1
0
16 Dec 2023
Agent Attention: On the Integration of Softmax and Linear Attention
Agent Attention: On the Integration of Softmax and Linear Attention
Dongchen Han
Tianzhu Ye
Yizeng Han
Zhuofan Xia
Siyuan Pan
Pengfei Wan
Shiji Song
Gao Huang
37
74
0
14 Dec 2023
Learning Long Sequences in Spiking Neural Networks
Learning Long Sequences in Spiking Neural Networks
Matei Ioan Stan
Oliver Rhodes
37
11
0
14 Dec 2023
Gated Linear Attention Transformers with Hardware-Efficient Training
Gated Linear Attention Transformers with Hardware-Efficient Training
Aaron Courville
Bailin Wang
Songlin Yang
Yikang Shen
Yoon Kim
48
142
0
11 Dec 2023
Transformers Implement Functional Gradient Descent to Learn Non-Linear
  Functions In Context
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Xiang Cheng
Yuxin Chen
S. Sra
18
35
0
11 Dec 2023
MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting
  Computation in Superposition
MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition
Nicolas Menet
Michael Hersche
G. Karunaratne
Luca Benini
Abu Sebastian
Abbas Rahimi
36
13
0
05 Dec 2023
SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust
  Attention
SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention
Isabel Leal
Krzysztof Choromanski
Deepali Jain
Kumar Avinava Dubey
Jake Varley
...
Q. Vuong
Tamás Sarlós
Kenneth Oslund
Karol Hausman
Kanishka Rao
44
8
0
04 Dec 2023
Token Fusion: Bridging the Gap between Token Pruning and Token Merging
Token Fusion: Bridging the Gap between Token Pruning and Token Merging
Minchul Kim
Shangqian Gao
Yen-Chang Hsu
Yilin Shen
Hongxia Jin
28
30
0
02 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
29
22
0
01 Dec 2023
Dimension Mixer: A Generalized Method for Structured Sparsity in Deep
  Neural Networks
Dimension Mixer: A Generalized Method for Structured Sparsity in Deep Neural Networks
Suman Sapkota
Binod Bhattarai
37
0
0
30 Nov 2023
On the Long Range Abilities of Transformers
On the Long Range Abilities of Transformers
Itamar Zimerman
Lior Wolf
32
7
0
28 Nov 2023
Learning Section Weights for Multi-Label Document Classification
Learning Section Weights for Multi-Label Document Classification
Maziar Moradi Fard
Paula Sorolla Bayod
Kiomars Motarjem
Mohammad Alian Nejadi
S. Akhondi
Camilo Thorne
19
0
0
26 Nov 2023
Linear Log-Normal Attention with Unbiased Concentration
Linear Log-Normal Attention with Unbiased Concentration
Yury Nahshan
Dor-Joseph Kampeas
E. Haleva
22
7
0
22 Nov 2023
Advancing Transformer Architecture in Long-Context Large Language
  Models: A Comprehensive Survey
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Yunpeng Huang
Jingwei Xu
Junyu Lai
Zixu Jiang
Taolue Chen
...
Xiaoxing Ma
Lijuan Yang
Zhou Xin
Shupeng Li
Penghao Zhao
LLMAG
KELM
38
55
0
21 Nov 2023
LATIS: Lambda Abstraction-based Thermal Image Super-resolution
LATIS: Lambda Abstraction-based Thermal Image Super-resolution
Gargi Panda
Soumitra Kundu
Saumik Bhattacharya
Aurobinda Routray
35
0
0
18 Nov 2023
Accelerating Toeplitz Neural Network with Constant-time Inference
  Complexity
Accelerating Toeplitz Neural Network with Constant-time Inference Complexity
Zhen Qin
Yiran Zhong
26
6
0
15 Nov 2023
Attention for Causal Relationship Discovery from Biological Neural
  Dynamics
Attention for Causal Relationship Discovery from Biological Neural Dynamics
Ziyu Lu
Anika Tabassum
Shruti R. Kulkarni
Lu Mi
J. Nathan Kutz
Eric Shea-Brown
Seung-Hwan Lim
CML
21
2
0
12 Nov 2023
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor
  Cores
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Daniel Y. Fu
Hermann Kumbong
Eric N. D. Nguyen
Christopher Ré
VLM
41
29
0
10 Nov 2023
Window Attention is Bugged: How not to Interpolate Position Embeddings
Window Attention is Bugged: How not to Interpolate Position Embeddings
Daniel Bolya
Chaitanya K. Ryali
Judy Hoffman
Christoph Feichtenhofer
43
10
0
09 Nov 2023
When Meta-Learning Meets Online and Continual Learning: A Survey
When Meta-Learning Meets Online and Continual Learning: A Survey
Jaehyeon Son
Soochan Lee
Gunhee Kim
OOD
CLL
34
11
0
09 Nov 2023
Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Zhen Qin
Aaron Courville
Yiran Zhong
36
74
0
08 Nov 2023
Ultra-Long Sequence Distributed Transformer
Ultra-Long Sequence Distributed Transformer
Xiao Wang
Isaac Lyngaas
A. Tsaris
Peng Chen
Sajal Dash
Mayanka Chandra Shekar
Tao Luo
Hong-Jun Yoon
M. Wahib
John P. Gounley
29
4
0
04 Nov 2023
Hardness of Low Rank Approximation of Entrywise Transformed Matrix
  Products
Hardness of Low Rank Approximation of Entrywise Transformed Matrix Products
Tamás Sarlós
Xingyou Song
David P. Woodruff
Qiuyi
Qiuyi Zhang
42
3
0
03 Nov 2023
Neural Atoms: Propagating Long-range Interaction in Molecular Graphs
  through Efficient Communication Channel
Neural Atoms: Propagating Long-range Interaction in Molecular Graphs through Efficient Communication Channel
Xuan Li
Zhanke Zhou
Jiangchao Yao
Yu Rong
Lu Zhang
Bo Han
42
3
0
02 Nov 2023
General-Purpose Retrieval-Enhanced Medical Prediction Model Using
  Near-Infinite History
General-Purpose Retrieval-Enhanced Medical Prediction Model Using Near-Infinite History
Junu Kim
Chaeeun Shim
Bosco Seong Kyu Yang
Chami Im
Sung Yoon Lim
Han-Gil Jeong
Edward Choi
33
8
0
31 Oct 2023
The Expressibility of Polynomial based Attention Scheme
The Expressibility of Polynomial based Attention Scheme
Zhao Song
Guangyi Xu
Junze Yin
32
5
0
30 Oct 2023
Transformers as Graph-to-Graph Models
Transformers as Graph-to-Graph Models
James Henderson
Alireza Mohammadshahi
Andrei Catalin Coman
Lesly Miculicich
GNN
27
6
0
27 Oct 2023
Sliceformer: Make Multi-head Attention as Simple as Sorting in
  Discriminative Tasks
Sliceformer: Make Multi-head Attention as Simple as Sorting in Discriminative Tasks
Shen Yuan
Hongteng Xu
24
0
0
26 Oct 2023
miditok: A Python package for MIDI file tokenization
miditok: A Python package for MIDI file tokenization
Nathan Fradet
Jean-Pierre Briot
F. Chhel
A. E. Seghrouchni
Nicolas Gutowski
32
39
0
26 Oct 2023
Practical Computational Power of Linear Transformers and Their Recurrent
  and Self-Referential Extensions
Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
36
11
0
24 Oct 2023
Nominality Score Conditioned Time Series Anomaly Detection by
  Point/Sequential Reconstruction
Nominality Score Conditioned Time Series Anomaly Detection by Point/Sequential Reconstruction
Chih-Yu Lai
Fan-Keng Sun
Zhengqi Gao
Jeffrey H. Lang
Duane S. Boning
AI4TS
31
15
0
24 Oct 2023
Manifold-Preserving Transformers are Effective for Short-Long Range
  Encoding
Manifold-Preserving Transformers are Effective for Short-Long Range Encoding
Ayan Sengupta
Md. Shad Akhtar
Tanmoy Chakraborty
20
0
0
22 Oct 2023
Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for
  Long Sequences
Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences
Yanming Kang
Giang Tran
H. Sterck
21
3
0
18 Oct 2023
Recasting Continual Learning as Sequence Modeling
Recasting Continual Learning as Sequence Modeling
Soochan Lee
Jaehyeon Son
Gunhee Kim
CLL
25
9
0
18 Oct 2023
Robust Wake-Up Word Detection by Two-stage Multi-resolution Ensembles
Robust Wake-Up Word Detection by Two-stage Multi-resolution Ensembles
F. López
Jordi Luque
Carlos Segura
Pablo Gómez
30
0
0
17 Oct 2023
SignGT: Signed Attention-based Graph Transformer for Graph
  Representation Learning
SignGT: Signed Attention-based Graph Transformer for Graph Representation Learning
Jinsong Chen
Gaichao Li
J. Hopcroft
Kun He
SLR
13
5
0
17 Oct 2023
Heterogenous Memory Augmented Neural Networks
Heterogenous Memory Augmented Neural Networks
Zihan Qiu
Zhen Liu
Shuicheng Yan
Shanghang Zhang
Jie Fu
20
0
0
17 Oct 2023
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
MoE
24
18
0
16 Oct 2023
Attentive Multi-Layer Perceptron for Non-autoregressive Generation
Attentive Multi-Layer Perceptron for Non-autoregressive Generation
Shuyang Jiang
Jinchao Zhang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
54
0
0
14 Oct 2023
Previous
123...678...192021
Next