ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.14794
  4. Cited By
Rethinking Attention with Performers

Rethinking Attention with Performers

30 September 2020
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
Tamás Sarlós
Peter Hawkins
Jared Davis
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
ArXivPDFHTML

Papers citing "Rethinking Attention with Performers"

50 / 1,019 papers shown
Title
Transformer with Fourier Integral Attentions
Transformer with Fourier Integral Attentions
T. Nguyen
Minh Pham
Tam Nguyen
Khai Nguyen
Stanley J. Osher
Nhat Ho
29
4
0
01 Jun 2022
Chefs' Random Tables: Non-Trigonometric Random Features
Chefs' Random Tables: Non-Trigonometric Random Features
Valerii Likhosherstov
K. Choromanski
Kumar Avinava Dubey
Frederick Liu
Tamás Sarlós
Adrian Weller
38
17
0
30 May 2022
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing
  Mechanisms in Sequence Learning
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Aniket Didolkar
Kshitij Gupta
Anirudh Goyal
Nitesh B. Gundavarapu
Alex Lamb
Nan Rosemary Ke
Yoshua Bengio
AI4CE
123
17
0
30 May 2022
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense
  Prediction
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction
Han Cai
Junyan Li
Muyan Hu
Chuang Gan
Song Han
37
49
0
29 May 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
116
2,061
0
27 May 2022
Future Transformer for Long-term Action Anticipation
Future Transformer for Long-term Action Anticipation
Dayoung Gong
Joonseok Lee
Manjin Kim
S. Ha
Minsu Cho
AI4TS
16
62
0
27 May 2022
X-ViT: High Performance Linear Vision Transformer without Softmax
X-ViT: High Performance Linear Vision Transformer without Softmax
Jeonggeun Song
Heung-Chang Lee
ViT
33
2
0
27 May 2022
Transformer for Partial Differential Equations' Operator Learning
Transformer for Partial Differential Equations' Operator Learning
Zijie Li
Kazem Meidani
A. Farimani
47
145
0
26 May 2022
Towards Learning Universal Hyperparameter Optimizers with Transformers
Towards Learning Universal Hyperparameter Optimizers with Transformers
Yutian Chen
Xingyou Song
Chansoo Lee
Zehao Wang
Qiuyi Zhang
...
Greg Kochanski
Arnaud Doucet
MarcÁurelio Ranzato
Sagi Perel
Nando de Freitas
32
63
0
26 May 2022
Fast Vision Transformers with HiLo Attention
Fast Vision Transformers with HiLo Attention
Zizheng Pan
Jianfei Cai
Bohan Zhuang
28
152
0
26 May 2022
Training Language Models with Memory Augmentation
Training Language Models with Memory Augmentation
Zexuan Zhong
Tao Lei
Danqi Chen
RALM
249
128
0
25 May 2022
Leveraging Locality in Abstractive Text Summarization
Leveraging Locality in Abstractive Text Summarization
Yixin Liu
Ansong Ni
Linyong Nan
Budhaditya Deb
Chenguang Zhu
Ahmed Hassan Awadallah
Dragomir R. Radev
35
18
0
25 May 2022
Recipe for a General, Powerful, Scalable Graph Transformer
Recipe for a General, Powerful, Scalable Graph Transformer
Ladislav Rampášek
Mikhail Galkin
Vijay Prakash Dwivedi
Anh Tuan Luu
Guy Wolf
Dominique Beaini
78
527
0
25 May 2022
Semi-Parametric Inducing Point Networks and Neural Processes
Semi-Parametric Inducing Point Networks and Neural Processes
R. Rastogi
Yair Schiff
Alon Hacohen
Zhaozhi Li
I-Hsiang Lee
Yuntian Deng
M. Sabuncu
Volodymyr Kuleshov
3DPC
29
6
0
24 May 2022
TransforMatcher: Match-to-Match Attention for Semantic Correspondence
TransforMatcher: Match-to-Match Attention for Semantic Correspondence
Seungwook Kim
Juhong Min
Minsu Cho
ViT
51
32
0
23 May 2022
Revisiting Pre-trained Language Models and their Evaluation for Arabic
  Natural Language Understanding
Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding
Abbas Ghaddar
Yimeng Wu
Sunyam Bagga
Ahmad Rashid
Khalil Bibi
...
Zhefeng Wang
Baoxing Huai
Xin Jiang
Qun Liu
Philippe Langlais
32
6
0
21 May 2022
Transformer-based out-of-distribution detection for clinically safe
  segmentation
Transformer-based out-of-distribution detection for clinically safe segmentation
M. Graham
Petru-Daniel Tudosiu
P. Wright
W. H. Pinaya
J. U-King-im
...
H. Jäger
D. Werring
P. Nachev
Sebastien Ourselin
M. Jorge Cardoso
MedIm
31
21
0
21 May 2022
KERPLE: Kernelized Relative Positional Embedding for Length
  Extrapolation
KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation
Ta-Chung Chi
Ting-Han Fan
Peter J. Ramadge
Alexander I. Rudnicky
49
65
0
20 May 2022
Towards Unified Keyframe Propagation Models
Towards Unified Keyframe Propagation Models
Patrick Esser
Peter Michael
Soumyadip Sengupta
VGen
35
0
0
19 May 2022
FiLM: Frequency improved Legendre Memory Model for Long-term Time Series
  Forecasting
FiLM: Frequency improved Legendre Memory Model for Long-term Time Series Forecasting
Tian Zhou
Ziqing Ma
Xue Wang
Qingsong Wen
Liang Sun
Tao Yao
Wotao Yin
Rong Jin
AI4TS
121
171
0
18 May 2022
Text Detection & Recognition in the Wild for Robot Localization
Text Detection & Recognition in the Wild for Robot Localization
Z. Raisi
John S. Zelek
31
0
0
17 May 2022
Multiformer: A Head-Configurable Transformer-Based Model for Direct
  Speech Translation
Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation
Gerard Sant
Gerard I. Gállego
Belen Alastruey
Marta R. Costa-jussá
22
3
0
14 May 2022
Supplementary Material: Implementation and Experiments for GAU-based
  Model
Supplementary Material: Implementation and Experiments for GAU-based Model
Zhenjie Liu
19
0
0
12 May 2022
Empowering parameter-efficient transfer learning by recognizing the
  kernel structure in self-attention
Empowering parameter-efficient transfer learning by recognizing the kernel structure in self-attention
Yifan Chen
Devamanyu Hazarika
Mahdi Namazifar
Yang Liu
Di Jin
Dilek Z. Hakkani-Tür
24
9
0
07 May 2022
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval
Shuai Zhao
Linchao Zhu
Xiaohan Wang
Yi Yang
VLM
CLIP
20
112
0
02 May 2022
LayoutBERT: Masked Language Layout Model for Object Insertion
LayoutBERT: Masked Language Layout Model for Object Insertion
Kerem Turgutlu
Sanatan Sharma
J. Kumar
VLM
DiffM
38
2
0
30 Apr 2022
Depth Estimation with Simplified Transformer
Depth Estimation with Simplified Transformer
John Yang
Le An
Anurag Dixit
Jinkyu Koo
Su Inn Park
MDE
41
21
0
28 Apr 2022
Transformers in Time-series Analysis: A Tutorial
Transformers in Time-series Analysis: A Tutorial
Sabeen Ahmed
Ian E. Nielsen
Aakash Tripathi
Shamoon Siddiqui
Ghulam Rasool
R. Ramachandran
AI4TS
44
143
0
28 Apr 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
49
150
0
27 Apr 2022
ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for
  Efficient Feature Matching
ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching
Yanxing Shi
Junxiong Cai
Yoli Shavit
Tai-Jiang Mu
Wensen Feng
Kai Zhang
GNN
27
77
0
25 Apr 2022
Investigating Neural Architectures by Synthetic Dataset Design
Investigating Neural Architectures by Synthetic Dataset Design
Adrien Courtois
Jean-Michel Morel
Pablo Arias
30
4
0
23 Apr 2022
Visual Attention Emerges from Recurrent Sparse Reconstruction
Visual Attention Emerges from Recurrent Sparse Reconstruction
Baifeng Shi
Ya-heng Song
Neel Joshi
Trevor Darrell
Xin Eric Wang
3DH
27
6
0
23 Apr 2022
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better
  than Dot-Product Self-Attention
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention
Tong Yu
Ruslan Khalitov
Lei Cheng
Zhirong Yang
MoE
27
10
0
22 Apr 2022
OpenGlue: Open Source Graph Neural Net Based Pipeline for Image Matching
OpenGlue: Open Source Graph Neural Net Based Pipeline for Image Matching
Ostap Viniavskyi
Mariia Dobko
Dmytro Mishkin
Oles Dobosevych
VLM
17
7
0
19 Apr 2022
Towards Lightweight Transformer via Group-wise Transformation for
  Vision-and-Language Tasks
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Yan Wang
Liujuan Cao
Yongjian Wu
Feiyue Huang
Rongrong Ji
ViT
22
43
0
16 Apr 2022
Efficient Linear Attention for Fast and Accurate Keypoint Matching
Efficient Linear Attention for Fast and Accurate Keypoint Matching
Suwichaya Suwanwimolkul
S. Komorita
3DPC
3DV
22
11
0
16 Apr 2022
Revisiting Transformer-based Models for Long Document Classification
Revisiting Transformer-based Models for Long Document Classification
Xiang Dai
Ilias Chalkidis
S. Darkner
Desmond Elliott
VLM
25
68
0
14 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
28
6
0
11 Apr 2022
Linear Complexity Randomized Self-attention Mechanism
Linear Complexity Randomized Self-attention Mechanism
Lin Zheng
Chong-Jun Wang
Lingpeng Kong
22
31
0
10 Apr 2022
Accelerating Attention through Gradient-Based Learned Runtime Pruning
Accelerating Attention through Gradient-Based Learned Runtime Pruning
Zheng Li
Soroush Ghodrati
Amir Yazdanbakhsh
H. Esmaeilzadeh
Mingu Kang
27
17
0
07 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
54
39
0
06 Apr 2022
Towards An End-to-End Framework for Flow-Guided Video Inpainting
Towards An End-to-End Framework for Flow-Guided Video Inpainting
Zerui Li
Cheng Lu
Jia Qin
Chunle Guo
Mingg-Ming Cheng
60
149
0
06 Apr 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
168
6,035
0
05 Apr 2022
Abstractive summarization of hospitalisation histories with transformer
  networks
Abstractive summarization of hospitalisation histories with transformer networks
Alexander Yalunin
D. Umerenkov
V. Kokh
MedIm
30
8
0
05 Apr 2022
SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features
  Learning from a Language Model
SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model
Yikang Zhang
Xiaomin Chu
Yelu Jiang
Hongjie Wu
Lijun Quan
11
4
0
05 Apr 2022
Long Movie Clip Classification with State-Space Video Models
Long Movie Clip Classification with State-Space Video Models
Md. Mohaiminul Islam
Gedas Bertasius
VLM
51
102
0
04 Apr 2022
TALLFormer: Temporal Action Localization with a Long-memory Transformer
TALLFormer: Temporal Action Localization with a Long-memory Transformer
Feng Cheng
Gedas Bertasius
ViT
37
91
0
04 Apr 2022
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
...
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
ReLM
LRM
66
574
0
01 Apr 2022
Deep Learning for Spectral Filling in Radio Frequency Applications
Deep Learning for Spectral Filling in Radio Frequency Applications
Matthew Setzler
Elizabeth Coda
J. Rounds
M. Vann
Michael Girard
34
1
0
31 Mar 2022
Scaling Language Model Size in Cross-Device Federated Learning
Scaling Language Model Size in Cross-Device Federated Learning
Jae Hun Ro
Theresa Breiner
Lara McConnaughey
Mingqing Chen
A. Suresh
Shankar Kumar
Rajiv Mathews
FedML
34
24
0
31 Mar 2022
Previous
123...141516...192021
Next