ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.14794
  4. Cited By
Rethinking Attention with Performers

Rethinking Attention with Performers

30 September 2020
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
Tamás Sarlós
Peter Hawkins
Jared Davis
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
ArXivPDFHTML

Papers citing "Rethinking Attention with Performers"

50 / 1,014 papers shown
Title
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and
  mmWave Radar
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar
Runwei Guan
Liye Jia
Fengyufan Yang
Shanliang Yao
Erick Purwanto
...
Eng Gee Lim
Jeremy S. Smith
Ka Lok Man
Xuming Hu
Yutao Yue
37
9
0
19 Mar 2024
HCPM: Hierarchical Candidates Pruning for Efficient Detector-Free
  Matching
HCPM: Hierarchical Candidates Pruning for Efficient Detector-Free Matching
Ying Chen
Yong-Jin Liu
Kai Wu
Qiang Nie
Shang Xu
Huifang Ma
Bing Wang
Chengjie Wang
VLM
40
1
0
19 Mar 2024
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Piotr Nawrot
Adrian Lañcucki
Marcin Chochowski
David Tarjan
E. Ponti
38
50
0
14 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for
  Video Understanding
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
64
73
0
14 Mar 2024
Introducing Routing Functions to Vision-Language Parameter-Efficient
  Fine-Tuning with Low-Rank Bottlenecks
Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
MoE
46
2
0
14 Mar 2024
BurstAttention: An Efficient Distributed Attention Framework for
  Extremely Long Sequences
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Sun Ao
Weilin Zhao
Xu Han
Cheng Yang
Zhiyuan Liu
Chuan Shi
Maosong Sun
GNN
40
8
0
14 Mar 2024
Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient
  Generative Inference
Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference
Muhammad Adnan
Akhil Arunkumar
Gaurav Jain
Prashant J. Nair
Ilya Soloveychik
Purushotham Kamath
35
52
0
14 Mar 2024
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
Heejune Sheen
Siyu Chen
Tianhao Wang
Harrison H. Zhou
MLT
38
10
0
13 Mar 2024
LookupFFN: Making Transformers Compute-lite for CPU inference
LookupFFN: Making Transformers Compute-lite for CPU inference
Zhanpeng Zeng
Michael Davies
Pranav Pulijala
Karthikeyan Sankaralingam
Vikas Singh
30
5
0
12 Mar 2024
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K
  Text-to-Image Generation
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Junsong Chen
Chongjian Ge
Enze Xie
Yue Wu
Lewei Yao
Xiaozhe Ren
Zhongdao Wang
Ping Luo
Huchuan Lu
Zhenguo Li
135
87
0
07 Mar 2024
TaylorShift: Shifting the Complexity of Self-Attention from Squared to
  Linear (and Back) using Taylor-Softmax
TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax
Tobias Christian Nauen
Sebastián M. Palacio
Andreas Dengel
54
3
0
05 Mar 2024
NiNformer: A Network in Network Transformer with Token Mixing Generated
  Gating Function
NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function
Abdullah Nazhat Abdullah
Tarkan Aydin
39
0
0
04 Mar 2024
NoMAD-Attention: Efficient LLM Inference on CPUs Through
  Multiply-add-free Attention
NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
Tianyi Zhang
Jonah Yi
Bowen Yao
Zhaozhuo Xu
Anshumali Shrivastava
MQ
27
6
0
02 Mar 2024
ATP: Enabling Fast LLM Serving via Attention on Top Principal Keys
ATP: Enabling Fast LLM Serving via Attention on Top Principal Keys
Yue Niu
Saurav Prakash
Salman Avestimehr
29
1
0
01 Mar 2024
GLFNET: Global-Local (frequency) Filter Networks for efficient medical
  image segmentation
GLFNET: Global-Local (frequency) Filter Networks for efficient medical image segmentation
Athanasios Tragakis
Qianying Liu
Chaitanya Kaul
S. K. Roy
Hang Dai
F. Deligianni
Roderick Murray-Smith
Daniele Faccio
MedIm
16
1
0
01 Mar 2024
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Mahdi Karami
Ali Ghodsi
VLM
48
6
0
28 Feb 2024
Interactive Multi-Head Self-Attention with Linear Complexity
Interactive Multi-Head Self-Attention with Linear Complexity
Hankyul Kang
Ming-Hsuan Yang
Jongbin Ryu
21
1
0
27 Feb 2024
Adaptation of Biomedical and Clinical Pretrained Models to French Long
  Documents: A Comparative Study
Adaptation of Biomedical and Clinical Pretrained Models to French Long Documents: A Comparative Study
Adrien Bazoge
Emmanuel Morin
B. Daille
P. Gourraud
19
2
0
26 Feb 2024
Long-Context Language Modeling with Parallel Context Encoding
Long-Context Language Modeling with Parallel Context Encoding
Howard Yen
Tianyu Gao
Danqi Chen
35
43
0
26 Feb 2024
Multimodal Transformer With a Low-Computational-Cost Guarantee
Multimodal Transformer With a Low-Computational-Cost Guarantee
Sungjin Park
Edward Choi
49
1
0
23 Feb 2024
Linear Transformers are Versatile In-Context Learners
Linear Transformers are Versatile In-Context Learners
Max Vladymyrov
J. Oswald
Mark Sandler
Rong Ge
34
13
0
21 Feb 2024
Do Efficient Transformers Really Save Computation?
Do Efficient Transformers Really Save Computation?
Kai-Bo Yang
Jan Ackermann
Zhenyu He
Guhao Feng
Bohang Zhang
Yunzhen Feng
Qiwei Ye
Di He
Liwei Wang
42
8
0
21 Feb 2024
Locality-Sensitive Hashing-Based Efficient Point Transformer with
  Applications in High-Energy Physics
Locality-Sensitive Hashing-Based Efficient Point Transformer with Applications in High-Energy Physics
Siqi Miao
Zhiyuan Lu
Mia Liu
Javier Duarte
Pan Li
57
4
0
19 Feb 2024
Perceiving Longer Sequences With Bi-Directional Cross-Attention
  Transformers
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
Markus Hiller
Krista A. Ehinger
Tom Drummond
46
1
0
19 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
41
48
0
15 Feb 2024
Get More with LESS: Synthesizing Recurrence with KV Cache Compression
  for Efficient LLM Inference
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
Harry Dong
Xinyu Yang
Zhenyu (Allen) Zhang
Zhangyang Wang
Yuejie Chi
Beidi Chen
32
49
0
14 Feb 2024
Graph Mamba: Towards Learning on Graphs with State Space Models
Graph Mamba: Towards Learning on Graphs with State Space Models
Ali Behrouz
Farnoosh Hashemi
AI4CE
112
58
0
13 Feb 2024
FAST: Factorizable Attention for Speeding up Transformers
FAST: Factorizable Attention for Speeding up Transformers
Armin Gerami
Monte Hoover
P. S. Dulepet
R. Duraiswami
27
0
0
12 Feb 2024
On the Efficacy of Eviction Policy for Key-Value Constrained Generative
  Language Model Inference
On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference
Siyu Ren
Kenny Q. Zhu
26
27
0
09 Feb 2024
Attention as Robust Representation for Time Series Forecasting
Attention as Robust Representation for Time Series Forecasting
Peisong Niu
Tian Zhou
Xue Wang
Liang Sun
Rong Jin
AI4TS
21
4
0
08 Feb 2024
Examining Modality Incongruity in Multimodal Federated Learning for
  Medical Vision and Language-based Disease Detection
Examining Modality Incongruity in Multimodal Federated Learning for Medical Vision and Language-based Disease Detection
Pramit Saha
Divyanshu Mishra
Felix Wagner
Konstantinos Kamnitsas
J. A. Noble
29
5
0
07 Feb 2024
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax
  Mimicry
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Michael Zhang
Kush S. Bhatia
Hermann Kumbong
Christopher Ré
32
48
0
06 Feb 2024
CAST: Clustering Self-Attention using Surrogate Tokens for Efficient
  Transformers
CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformers
Adjorn van Engelenhoven
Nicola Strisciuglio
Estefanía Talavera
23
1
0
06 Feb 2024
Progress and Opportunities of Foundation Models in Bioinformatics
Progress and Opportunities of Foundation Models in Bioinformatics
Qing Li
Zhihang Hu
Yixuan Wang
Lei Li
Yimin Fan
Irwin King
Le Song
Yu-Hu Li
AI4CE
40
9
0
06 Feb 2024
Is Mamba Capable of In-Context Learning?
Is Mamba Capable of In-Context Learning?
Riccardo Grazzi
Julien N. Siems
Simon Schrodi
Thomas Brox
Frank Hutter
32
40
0
05 Feb 2024
A Survey on Transformer Compression
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
37
28
0
05 Feb 2024
Transolver: A Fast Transformer Solver for PDEs on General Geometries
Transolver: A Fast Transformer Solver for PDEs on General Geometries
Haixu Wu
Huakun Luo
Haowen Wang
Jianmin Wang
Mingsheng Long
AI4CE
49
42
0
04 Feb 2024
Unification of Symmetries Inside Neural Networks: Transformer,
  Feedforward and Neural ODE
Unification of Symmetries Inside Neural Networks: Transformer, Feedforward and Neural ODE
Koji Hashimoto
Yuji Hirono
Akiyoshi Sannai
AI4CE
37
7
0
04 Feb 2024
Beyond the Limits: A Survey of Techniques to Extend the Context Length
  in Large Language Models
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
Xindi Wang
Mahsa Salmani
Parsa Omidi
Xiangyu Ren
Mehdi Rezagholizadeh
A. Eshaghi
LRM
34
35
0
03 Feb 2024
Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian
  Processes
Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes
Yingyi Chen
Qinghua Tao
F. Tonin
Johan A. K. Suykens
22
1
0
02 Feb 2024
Streaming Sequence Transduction through Dynamic Compression
Streaming Sequence Transduction through Dynamic Compression
Weiting Tan
Yunmo Chen
Tongfei Chen
Guanghui Qin
Haoran Xu
Heidi C. Zhang
Benjamin Van Durme
Philipp Koehn
24
2
0
02 Feb 2024
Repeat After Me: Transformers are Better than State Space Models at
  Copying
Repeat After Me: Transformers are Better than State Space Models at Copying
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
100
78
0
01 Feb 2024
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective
  State Spaces
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces
Chloe X. Wang
Oleksii Tsepa
Jun Ma
Bo Wang
Mamba
30
86
0
01 Feb 2024
Hybrid Quantum Vision Transformers for Event Classification in High
  Energy Physics
Hybrid Quantum Vision Transformers for Event Classification in High Energy Physics
Eyup B. Unlu
Marçal Comajoan Cara
Gopal Ramesh Dahale
Zhongtian Dong
Roy T. Forestano
...
Daniel Justice
Kyoungchul Kong
Tom Magorsch
Konstantin T. Matchev
Katia Matcheva
28
12
0
01 Feb 2024
Computation and Parameter Efficient Multi-Modal Fusion Transformer for
  Cued Speech Recognition
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition
Lei Liu
Li Liu
Haizhou Li
26
6
0
31 Jan 2024
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
Seokju Yun
Youngmin Ro
ViT
44
29
0
29 Jan 2024
FedGT: Federated Node Classification with Scalable Graph Transformer
FedGT: Federated Node Classification with Scalable Graph Transformer
Zaixin Zhang
Qingyong Hu
Yang Yu
Weibo Gao
Qi Liu
FedML
46
2
0
26 Jan 2024
Vision Mamba: Efficient Visual Representation Learning with
  Bidirectional State Space Model
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu
Bencheng Liao
Qian Zhang
Xinlong Wang
Wenyu Liu
Xinggang Wang
Mamba
50
710
0
17 Jan 2024
The What, Why, and How of Context Length Extension Techniques in Large
  Language Models -- A Detailed Survey
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Saurav Pawar
S.M. Towhidul Islam Tonmoy
S. M. M. Zaman
Vinija Jain
Aman Chadha
Amitava Das
37
27
0
15 Jan 2024
Extending LLMs' Context Window with 100 Samples
Extending LLMs' Context Window with 100 Samples
Yikai Zhang
Junlong Li
Pengfei Liu
37
11
0
13 Jan 2024
Previous
123...567...192021
Next