ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.14794
  4. Cited By
Rethinking Attention with Performers

Rethinking Attention with Performers

30 September 2020
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
Tamás Sarlós
Peter Hawkins
Jared Davis
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
ArXivPDFHTML

Papers citing "Rethinking Attention with Performers"

50 / 1,019 papers shown
Title
Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic
  Segmentation
Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic Segmentation
Shuangjie Xu
Rui Wan
Maosheng Ye
Xiaoyi Zou
Tongyi Cao
3DPC
18
32
0
16 Jan 2022
Datasheet for the Pile
Datasheet for the Pile
Stella Biderman
Kieran Bicheno
Leo Gao
52
35
0
13 Jan 2022
GateFormer: Speeding Up News Feed Recommendation with Input Gated
  Transformers
GateFormer: Speeding Up News Feed Recommendation with Input Gated Transformers
Peitian Zhang
Zheng liu
AI4TS
30
1
0
12 Jan 2022
Efficient Non-Local Contrastive Attention for Image Super-Resolution
Efficient Non-Local Contrastive Attention for Image Super-Resolution
Bin Xia
Yucheng Hang
Yapeng Tian
Wenming Yang
Q. Liao
Jie Zhou
SupR
29
78
0
11 Jan 2022
λ-Scaled-Attention: A Novel Fast Attention Mechanism for
  Efficient Modeling of Protein Sequences
λ-Scaled-Attention: A Novel Fast Attention Mechanism for Efficient Modeling of Protein Sequences
Ashish Ranjan
M. S. Fahad
A. Deepak
18
3
0
09 Jan 2022
Attention-based Random Forest and Contamination Model
Attention-based Random Forest and Contamination Model
Lev V. Utkin
A. Konstantinov
26
29
0
08 Jan 2022
QuadTree Attention for Vision Transformers
QuadTree Attention for Vision Transformers
Shitao Tang
Jiahui Zhang
Siyu Zhu
Ping Tan
ViT
169
156
0
08 Jan 2022
Low-Rank Constraints for Fast Inference in Structured Models
Low-Rank Constraints for Fast Inference in Structured Models
Justin T. Chiu
Yuntian Deng
Alexander M. Rush
BDL
32
13
0
08 Jan 2022
Classification of Long Sequential Data using Circular Dilated
  Convolutional Neural Networks
Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks
Lei Cheng
Ruslan Khalitov
Tong Yu
Zhirong Yang
25
32
0
06 Jan 2022
Learning Operators with Coupled Attention
Learning Operators with Coupled Attention
Georgios Kissas
Jacob H. Seidman
Leonardo Ferreira Guilhoto
V. Preciado
George J. Pappas
P. Perdikaris
32
110
0
04 Jan 2022
ViR:the Vision Reservoir
ViR:the Vision Reservoir
Xian Wei
Bin Wang
Mingsong Chen
Ji Yuan
Hai Lan
Jiehuang Shi
Xuan Tang
Bo Jin
Guozhang Chen
Dongping Yang
ViT
26
2
0
27 Dec 2021
Distilling the Knowledge of Romanian BERTs Using Multiple Teachers
Distilling the Knowledge of Romanian BERTs Using Multiple Teachers
Andrei-Marius Avram
Darius Catrina
Dumitru-Clementin Cercel
Mihai Dascualu
Traian Rebedea
Vasile Puaics
Dan Tufics
22
12
0
23 Dec 2021
Self-Supervised Graph Representation Learning for Neuronal Morphologies
Self-Supervised Graph Representation Learning for Neuronal Morphologies
Marissa A. Weis
Laura Hansel
Timo Lüddecke
Alexander S. Ecker
MedIm
25
7
0
23 Dec 2021
Domain Adaptation with Pre-trained Transformers for Query Focused
  Abstractive Text Summarization
Domain Adaptation with Pre-trained Transformers for Query Focused Abstractive Text Summarization
Md Tahmid Rahman Laskar
Enamul Hoque
J. Huang
40
45
0
22 Dec 2021
Full Transformer Framework for Robust Point Cloud Registration with Deep
  Information Interaction
Full Transformer Framework for Robust Point Cloud Registration with Deep Information Interaction
Guang-Sheng Chen
Meiling Wang
Yufeng Yue
Qingxiang Zhang
Li-xin Yuan
ViT
39
17
0
17 Dec 2021
Neural Architectures for Biological Inter-Sentence Relation Extraction
Neural Architectures for Biological Inter-Sentence Relation Extraction
Enrique Noriega-Atala
Peter Lovett
Clayton T. Morrison
Mihai Surdeanu
NAI
33
3
0
17 Dec 2021
LongT5: Efficient Text-To-Text Transformer for Long Sequences
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Mandy Guo
Joshua Ainslie
David C. Uthus
Santiago Ontanon
Jianmo Ni
Yun-hsuan Sung
Yinfei Yang
VLM
31
306
0
15 Dec 2021
AdaViT: Adaptive Tokens for Efficient Vision Transformer
AdaViT: Adaptive Tokens for Efficient Vision Transformer
Hongxu Yin
Arash Vahdat
J. Álvarez
Arun Mallya
Jan Kautz
Pavlo Molchanov
ViT
35
316
0
14 Dec 2021
Simple Local Attentions Remain Competitive for Long-Context Tasks
Simple Local Attentions Remain Competitive for Long-Context Tasks
Wenhan Xiong
Barlas Ouguz
Anchit Gupta
Xilun Chen
Diana Liskovich
Omer Levy
Wen-tau Yih
Yashar Mehdad
49
29
0
14 Dec 2021
Self-attention Does Not Need $O(n^2)$ Memory
Self-attention Does Not Need O(n2)O(n^2)O(n2) Memory
M. Rabe
Charles Staats
LRM
26
141
0
10 Dec 2021
Couplformer:Rethinking Vision Transformer with Coupling Attention Map
Couplformer:Rethinking Vision Transformer with Coupling Attention Map
Hai Lan
Xihao Wang
Xian Wei
ViT
34
3
0
10 Dec 2021
Sketching as a Tool for Understanding and Accelerating Self-attention
  for Long Sequences
Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences
Yifan Chen
Qi Zeng
Dilek Z. Hakkani-Tür
Di Jin
Heng Ji
Yun Yang
25
4
0
10 Dec 2021
3D Medical Point Transformer: Introducing Convolution to Attention
  Networks for Medical Point Cloud Analysis
3D Medical Point Transformer: Introducing Convolution to Attention Networks for Medical Point Cloud Analysis
Jianhui Yu
Chaoyi Zhang
Heng Wang
Dingxin Zhang
Yang Song
Tiange Xiang
Dongnan Liu
Weidong (Tom) Cai
ViT
MedIm
21
32
0
09 Dec 2021
CDGNet: A Cross-Time Dynamic Graph-based Deep Learning Model for Traffic
  Forecasting
CDGNet: A Cross-Time Dynamic Graph-based Deep Learning Model for Traffic Forecasting
Yuchen Fang
Yanjun Qin
Haiyong Luo
Fang Zhao
Liang Zeng
Bo Hui
Chenxing Wang
GNN
BDL
AI4TS
24
9
0
06 Dec 2021
Learning Query Expansion over the Nearest Neighbor Graph
Learning Query Expansion over the Nearest Neighbor Graph
Benjamin Klein
Lior Wolf
30
1
0
05 Dec 2021
STJLA: A Multi-Context Aware Spatio-Temporal Joint Linear Attention
  Network for Traffic Forecasting
STJLA: A Multi-Context Aware Spatio-Temporal Joint Linear Attention Network for Traffic Forecasting
Yuchen Fang
Yanjun Qin
Haiyong Luo
Fang Zhao
Chenxing Wang
GNN
AI4TS
19
1
0
04 Dec 2021
Graph Conditioned Sparse-Attention for Improved Source Code
  Understanding
Graph Conditioned Sparse-Attention for Improved Source Code Understanding
Junyan Cheng
Iordanis Fostiropoulos
Barry W. Boehm
27
1
0
01 Dec 2021
Systematic Generalization with Edge Transformers
Systematic Generalization with Edge Transformers
Leon Bergen
Timothy J. O'Donnell
Dzmitry Bahdanau
19
46
0
01 Dec 2021
A Unified Pruning Framework for Vision Transformers
A Unified Pruning Framework for Vision Transformers
Hao Yu
Jianxin Wu
ViT
28
62
0
30 Nov 2021
Sparse DETR: Efficient End-to-End Object Detection with Learnable
  Sparsity
Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity
Byungseok Roh
Jaewoong Shin
Wuhyun Shin
Saehoon Kim
ViT
24
142
0
29 Nov 2021
SWAT: Spatial Structure Within and Among Tokens
SWAT: Spatial Structure Within and Among Tokens
Kumara Kahatapitiya
Michael S. Ryoo
25
6
0
26 Nov 2021
Sparse is Enough in Scaling Transformers
Sparse is Enough in Scaling Transformers
Sebastian Jaszczur
Aakanksha Chowdhery
Afroz Mohiuddin
Lukasz Kaiser
Wojciech Gajewski
Henryk Michalewski
Jonni Kanerva
MoE
29
101
0
24 Nov 2021
Adaptive Fourier Neural Operators: Efficient Token Mixers for
  Transformers
Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers
John Guibas
Morteza Mardani
Zong-Yi Li
Andrew Tao
Anima Anandkumar
Bryan Catanzaro
21
231
0
24 Nov 2021
SimpleTRON: Simple Transformer with O(N) Complexity
SimpleTRON: Simple Transformer with O(N) Complexity
Uladzislau Yorsh
Alexander Kovalenko
Vojtvech Vanvcura
Daniel Vavsata
Pavel Kordík
Tomávs Mikolov
36
1
0
23 Nov 2021
You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli
  Sampling
You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling
Zhanpeng Zeng
Yunyang Xiong
Sathya Ravi
Shailesh Acharya
G. Fung
Vikas Singh
35
19
0
18 Nov 2021
Attention Mechanisms in Computer Vision: A Survey
Attention Mechanisms in Computer Vision: A Survey
Meng-Hao Guo
Tianhan Xu
Jiangjiang Liu
Zheng-Ning Liu
Peng-Tao Jiang
Tai-Jiang Mu
Song-Hai Zhang
Ralph Robert Martin
Ming-Ming Cheng
Shimin Hu
19
1,638
0
15 Nov 2021
Attention Approximates Sparse Distributed Memory
Attention Approximates Sparse Distributed Memory
Trenton Bricken
Cengiz Pehlevan
35
34
0
10 Nov 2021
A Survey on Green Deep Learning
A Survey on Green Deep Learning
Jingjing Xu
Wangchunshu Zhou
Zhiyi Fu
Hao Zhou
Lei Li
VLM
83
83
0
08 Nov 2021
Theme Transformer: Symbolic Music Generation with Theme-Conditioned
  Transformer
Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer
Yi-Jen Shih
Shih-Lun Wu
Frank Zalkow
Meinard Muller
Yi-Hsuan Yang
35
76
0
07 Nov 2021
PhyloTransformer: A Discriminative Model for Mutation Prediction Based
  on a Multi-head Self-attention Mechanism
PhyloTransformer: A Discriminative Model for Mutation Prediction Based on a Multi-head Self-attention Mechanism
Yingying Wu
Shusheng Xu
S. Yau
Yi Wu
MedIm
24
1
0
03 Nov 2021
Kernel Deformed Exponential Families for Sparse Continuous Attention
Kernel Deformed Exponential Families for Sparse Continuous Attention
Alexander Moreno
Supriya Nagesh
Zhenke Wu
Walter Dempsey
James M. Rehg
22
1
0
01 Nov 2021
Efficiently Modeling Long Sequences with Structured State Spaces
Efficiently Modeling Long Sequences with Structured State Spaces
Albert Gu
Karan Goel
Christopher Ré
52
1,680
0
31 Oct 2021
PatchFormer: An Efficient Point Transformer with Patch Attention
PatchFormer: An Efficient Point Transformer with Patch Attention
Zhang Cheng
Haocheng Wan
Xinyi Shen
Zizhao Wu
3DPC
24
66
0
30 Oct 2021
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström
  Method
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method
Yifan Chen
Qi Zeng
Heng Ji
Yun Yang
24
49
0
29 Oct 2021
Scatterbrain: Unifying Sparse and Low-rank Attention Approximation
Scatterbrain: Unifying Sparse and Low-rank Attention Approximation
Beidi Chen
Tri Dao
Eric Winsor
Zhao Song
Atri Rudra
Christopher Ré
37
125
0
28 Oct 2021
Blending Anti-Aliasing into Vision Transformer
Blending Anti-Aliasing into Vision Transformer
Shengju Qian
Hao Shao
Yi Zhu
Mu Li
Jiaya Jia
28
20
0
28 Oct 2021
Transformers Generalize DeepSets and Can be Extended to Graphs and
  Hypergraphs
Transformers Generalize DeepSets and Can be Extended to Graphs and Hypergraphs
Jinwoo Kim
Saeyoon Oh
Seunghoon Hong
AI4CE
22
41
0
27 Oct 2021
Hierarchical Transformers Are More Efficient Language Models
Hierarchical Transformers Are More Efficient Language Models
Piotr Nawrot
Szymon Tworkowski
Michał Tyrolski
Lukasz Kaiser
Yuhuai Wu
Christian Szegedy
Henryk Michalewski
30
61
0
26 Oct 2021
TUNet: A Block-online Bandwidth Extension Model based on Transformers
  and Self-supervised Pretraining
TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining
Viet-Anh Nguyen
Anh H. T. Nguyen
Andy W. H. Khong
27
22
0
26 Oct 2021
SOFT: Softmax-free Transformer with Linear Complexity
SOFT: Softmax-free Transformer with Linear Complexity
Jiachen Lu
Jinghan Yao
Junge Zhang
Martin Danelljan
Hang Xu
Weiguo Gao
Chunjing Xu
Thomas B. Schon
Li Zhang
18
161
0
22 Oct 2021
Previous
123...161718192021
Next