ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.14794
  4. Cited By
Rethinking Attention with Performers

Rethinking Attention with Performers

30 September 2020
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
Tamás Sarlós
Peter Hawkins
Jared Davis
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
ArXivPDFHTML

Papers citing "Rethinking Attention with Performers"

50 / 1,014 papers shown
Title
HTLM: Hyper-Text Pre-Training and Prompting of Language Models
HTLM: Hyper-Text Pre-Training and Prompting of Language Models
Armen Aghajanyan
Dmytro Okhonko
M. Lewis
Mandar Joshi
Hu Xu
Gargi Ghosh
Luke Zettlemoyer
VLM
VPVLM
AI4TS
AI4CE
19
76
0
14 Jul 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
38
57
0
13 Jul 2021
Combiner: Full Attention Transformer with Sparse Computation Cost
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
81
77
0
12 Jul 2021
BERT-like Pre-training for Symbolic Piano Music Classification Tasks
BERT-like Pre-training for Symbolic Piano Music Classification Tasks
Yi-Hui Chou
I-Chun Chen
Chin-Jui Chang
Joann Ching
Yi-Hsuan Yang
32
25
0
12 Jul 2021
Grid Partitioned Attention: Efficient TransformerApproximation with
  Inductive Bias for High Resolution Detail Generation
Grid Partitioned Attention: Efficient TransformerApproximation with Inductive Bias for High Resolution Detail Generation
Nikolay Jetchev
Gökhan Yildirim
Christian Bracher
Roland Vollgraf
11
0
0
08 Jul 2021
Poly-NL: Linear Complexity Non-local Layers with Polynomials
Poly-NL: Linear Complexity Non-local Layers with Polynomials
F. Babiloni
Ioannis Marras
Filippos Kokkinos
Jiankang Deng
Grigorios G. Chrysos
S. Zafeiriou
36
6
0
06 Jul 2021
Vision Xformers: Efficient Attention for Image Classification
Vision Xformers: Efficient Attention for Image Classification
Pranav Jeevan
Amit Sethi
ViT
25
13
0
05 Jul 2021
Long-Short Transformer: Efficient Transformers for Language and Vision
Long-Short Transformer: Efficient Transformers for Language and Vision
Chen Zhu
Ming-Yu Liu
Chaowei Xiao
M. Shoeybi
Tom Goldstein
Anima Anandkumar
Bryan Catanzaro
ViT
VLM
32
131
0
05 Jul 2021
SHORING: Design Provable Conditional High-Order Interaction Network via
  Symbolic Testing
SHORING: Design Provable Conditional High-Order Interaction Network via Symbolic Testing
Hui Li
Xingbo Fu
Ruofan Wu
Jinyu Xu
Kai Y. Xiao
...
Weiqiang Wang
Shuai Chen
Leilei Shi
Tao Xiong
Yuan Qi
AI4TS
15
0
0
03 Jul 2021
CSWin Transformer: A General Vision Transformer Backbone with
  Cross-Shaped Windows
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Xiaoyi Dong
Jianmin Bao
Dongdong Chen
Weiming Zhang
Nenghai Yu
Lu Yuan
Dong Chen
B. Guo
ViT
25
956
0
01 Jul 2021
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin
  Information
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information
Zijun Sun
Xiaoya Li
Xiaofei Sun
Yuxian Meng
Xiang Ao
Qing He
Fei Wu
Jiwei Li
SSeg
57
183
0
30 Jun 2021
DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using
  linear complexity self-attention for speech enhancement
DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement
Yuma Koizumi
Shigeki Karita
Scott Wisdom
Hakan Erdogan
J. Hershey
Llion Jones
M. Bacchiani
19
41
0
30 Jun 2021
Knowledge Transfer by Discriminative Pre-training for Academic
  Performance Prediction
Knowledge Transfer by Discriminative Pre-training for Academic Performance Prediction
Byungsoo Kim
Hangyeol Yu
Dongmin Shin
Youngduck Choi
12
1
0
28 Jun 2021
VEGN: Variant Effect Prediction with Graph Neural Networks
VEGN: Variant Effect Prediction with Graph Neural Networks
Jun Cheng
Carolin (Haas) Lawrence
Mathias Niepert
13
1
0
25 Jun 2021
Charformer: Fast Character Transformers via Gradient-based Subword
  Tokenization
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Yi Tay
Vinh Q. Tran
Sebastian Ruder
Jai Gupta
Hyung Won Chung
Dara Bahri
Zhen Qin
Simon Baumgartner
Cong Yu
Donald Metzler
51
152
0
23 Jun 2021
Stable, Fast and Accurate: Kernelized Attention with Relative Positional
  Encoding
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding
Shengjie Luo
Shanda Li
Tianle Cai
Di He
Dinglan Peng
Shuxin Zheng
Guolin Ke
Liwei Wang
Tie-Yan Liu
29
50
0
23 Jun 2021
Hi-BEHRT: Hierarchical Transformer-based model for accurate prediction
  of clinical events using multimodal longitudinal electronic health records
Hi-BEHRT: Hierarchical Transformer-based model for accurate prediction of clinical events using multimodal longitudinal electronic health records
Yikuan Li
M. Mamouei
G. Salimi-Khorshidi
Shishir Rao
A. Hassaine
D. Canoy
Thomas Lukasiewicz
K. Rahimi
21
76
0
21 Jun 2021
XCiT: Cross-Covariance Image Transformers
XCiT: Cross-Covariance Image Transformers
Alaaeldin El-Nouby
Hugo Touvron
Mathilde Caron
Piotr Bojanowski
Matthijs Douze
...
Ivan Laptev
Natalia Neverova
Gabriel Synnaeve
Jakob Verbeek
Hervé Jégou
ViT
42
499
0
17 Jun 2021
Large-Scale Chemical Language Representations Capture Molecular
  Structure and Properties
Large-Scale Chemical Language Representations Capture Molecular Structure and Properties
Jerret Ross
Brian M. Belgodere
Vijil Chenthamarakshan
Inkit Padhi
Youssef Mroueh
Payel Das
AI4CE
27
272
0
17 Jun 2021
Eigen Analysis of Self-Attention and its Reconstruction from Partial
  Computation
Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation
Srinadh Bhojanapalli
Ayan Chakrabarti
Himanshu Jain
Sanjiv Kumar
Michal Lukasik
Andreas Veit
24
8
0
16 Jun 2021
PairConnect: A Compute-Efficient MLP Alternative to Attention
PairConnect: A Compute-Efficient MLP Alternative to Attention
Zhaozhuo Xu
Minghao Yan
Junyan Zhang
Anshumali Shrivastava
44
1
0
15 Jun 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
40
815
0
14 Jun 2021
Memory-efficient Transformers via Top-$k$ Attention
Memory-efficient Transformers via Top-kkk Attention
Ankit Gupta
Guy Dar
Shaya Goodman
David Ciprut
Jonathan Berant
MQ
42
51
0
13 Jun 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
33
57
0
11 Jun 2021
Transformed CNNs: recasting pre-trained convolutional layers with
  self-attention
Transformed CNNs: recasting pre-trained convolutional layers with self-attention
Stéphane dÁscoli
Levent Sagun
Giulio Biroli
Ari S. Morcos
ViT
15
6
0
10 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
18
274
0
09 Jun 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
49
1,167
0
09 Jun 2021
TED-net: Convolution-free T2T Vision Transformer-based Encoder-decoder
  Dilation network for Low-dose CT Denoising
TED-net: Convolution-free T2T Vision Transformer-based Encoder-decoder Dilation network for Low-dose CT Denoising
Dayang Wang
Zhan Wu
Hengyong Yu
ViT
MedIm
19
53
0
08 Jun 2021
A Survey of Transformers
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
53
1,088
0
08 Jun 2021
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Tianlong Chen
Yu Cheng
Zhe Gan
Lu Yuan
Lei Zhang
Zhangyang Wang
ViT
15
216
0
08 Jun 2021
On the Expressive Power of Self-Attention Matrices
On the Expressive Power of Self-Attention Matrices
Valerii Likhosherstov
K. Choromanski
Adrian Weller
37
34
0
07 Jun 2021
Self-Attention Between Datapoints: Going Beyond Individual Input-Output
  Pairs in Deep Learning
Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning
Jannik Kossen
Neil Band
Clare Lyle
Aidan Gomez
Tom Rainforth
Y. Gal
OOD
3DPC
33
137
0
04 Jun 2021
Detect the Interactions that Matter in Matter: Geometric Attention for
  Many-Body Systems
Detect the Interactions that Matter in Matter: Geometric Attention for Many-Body Systems
Thorben Frank
Stefan Chmiela
23
3
0
04 Jun 2021
Luna: Linear Unified Nested Attention
Luna: Linear Unified Nested Attention
Xuezhe Ma
Xiang Kong
Sinong Wang
Chunting Zhou
Jonathan May
Hao Ma
Luke Zettlemoyer
33
114
0
03 Jun 2021
Container: Context Aggregation Network
Container: Context Aggregation Network
Peng Gao
Jiasen Lu
Hongsheng Li
Roozbeh Mottaghi
Aniruddha Kembhavi
ViT
22
69
0
02 Jun 2021
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model
Jiangning Zhang
Chao Xu
Jian Li
Wenzhou Chen
Yabiao Wang
Ying Tai
Shuo Chen
Chengjie Wang
Feiyue Huang
Yong Liu
32
22
0
31 May 2021
Choose a Transformer: Fourier or Galerkin
Choose a Transformer: Fourier or Galerkin
Shuhao Cao
42
225
0
31 May 2021
An Attention Free Transformer
An Attention Free Transformer
Shuangfei Zhai
Walter A. Talbott
Nitish Srivastava
Chen Huang
Hanlin Goh
Ruixiang Zhang
J. Susskind
ViT
35
127
0
28 May 2021
PTNet: A High-Resolution Infant MRI Synthesizer Based on Transformer
PTNet: A High-Resolution Infant MRI Synthesizer Based on Transformer
Xuzhe Zhang
Xinzi He
Jia Guo
Nabil Ettehadi
Natalie Aw
David P. Semanek
J. Posner
Andrew F. Laine
Yun Wang
ViT
MedIm
8
23
0
28 May 2021
Relative Positional Encoding for Transformers with Linear Complexity
Relative Positional Encoding for Transformers with Linear Complexity
Antoine Liutkus
Ondřej Cífka
Shih-Lun Wu
Umut Simsekli
Yi-Hsuan Yang
Gaël Richard
33
44
0
18 May 2021
Protein sequence-to-structure learning: Is this the end(-to-end
  revolution)?
Protein sequence-to-structure learning: Is this the end(-to-end revolution)?
É. Laine
Stephan Eismann
A. Elofsson
Sergei Grudinin
OOD
3DV
23
34
0
16 May 2021
Poolingformer: Long Document Modeling with Pooling Attention
Poolingformer: Long Document Modeling with Pooling Attention
Hang Zhang
Yeyun Gong
Yelong Shen
Weisheng Li
Jiancheng Lv
Nan Duan
Weizhu Chen
37
0
0
10 May 2021
T-EMDE: Sketching-based global similarity for cross-modal retrieval
T-EMDE: Sketching-based global similarity for cross-modal retrieval
Barbara Rychalska
Mikolaj Wieczorek
Jacek Dąbrowski
33
0
0
10 May 2021
MuseMorphose: Full-Song and Fine-Grained Piano Music Style Transfer with
  One Transformer VAE
MuseMorphose: Full-Song and Fine-Grained Piano Music Style Transfer with One Transformer VAE
Shih-Lun Wu
Yi-Hsuan Yang
ViT
25
53
0
10 May 2021
FNet: Mixing Tokens with Fourier Transforms
FNet: Mixing Tokens with Fourier Transforms
James Lee-Thorp
Joshua Ainslie
Ilya Eckstein
Santiago Ontanon
35
517
0
09 May 2021
Long-Span Summarization via Local Attention and Content Selection
Long-Span Summarization via Local Attention and Content Selection
Potsawee Manakul
Mark J. F. Gales
15
42
0
08 May 2021
Beyond Self-attention: External Attention using Two Linear Layers for
  Visual Tasks
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks
Meng-Hao Guo
Zheng-Ning Liu
Tai-Jiang Mu
Shimin Hu
25
472
0
05 May 2021
SpookyNet: Learning Force Fields with Electronic Degrees of Freedom and
  Nonlocal Effects
SpookyNet: Learning Force Fields with Electronic Degrees of Freedom and Nonlocal Effects
Oliver T. Unke
Stefan Chmiela
M. Gastegger
Kristof T. Schütt
H. E. Sauceda
K. Müller
177
247
0
01 May 2021
Visual Saliency Transformer
Visual Saliency Transformer
Nian Liu
Ni Zhang
Kaiyuan Wan
Ling Shao
Junwei Han
ViT
253
352
0
25 Apr 2021
Transfer training from smaller language model
Transfer training from smaller language model
Han Zhang
35
0
0
23 Apr 2021
Previous
123...18192021
Next