ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.14794
  4. Cited By
Rethinking Attention with Performers

Rethinking Attention with Performers

30 September 2020
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
Tamás Sarlós
Peter Hawkins
Jared Davis
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
ArXivPDFHTML

Papers citing "Rethinking Attention with Performers"

50 / 1,016 papers shown
Title
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
MoE
27
18
0
16 Oct 2023
Attentive Multi-Layer Perceptron for Non-autoregressive Generation
Attentive Multi-Layer Perceptron for Non-autoregressive Generation
Shuyang Jiang
Jinchao Zhang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
54
0
0
14 Oct 2023
End-to-end Story Plot Generator
End-to-end Story Plot Generator
Hanlin Zhu
Andrew Cohen
Danqing Wang
Kevin Kaichuang Yang
Xiaomeng Yang
Jiantao Jiao
Yuandong Tian
27
5
0
13 Oct 2023
Pit One Against Many: Leveraging Attention-head Embeddings for
  Parameter-efficient Multi-head Attention
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention
Huiyin Xue
Nikolaos Aletras
42
0
0
11 Oct 2023
Supercharging Graph Transformers with Advective Diffusion
Supercharging Graph Transformers with Advective Diffusion
Qitian Wu
Chenxiao Yang
Kaipeng Zeng
Fan Nie
AI4CE
53
6
0
10 Oct 2023
HyperAttention: Long-context Attention in Near-Linear Time
HyperAttention: Long-context Attention in Near-Linear Time
Insu Han
Rajesh Jayaram
Amin Karbasi
Vahab Mirrokni
David P. Woodruff
A. Zandieh
36
62
0
09 Oct 2023
Tailoring Self-Attention for Graph via Rooted Subtrees
Tailoring Self-Attention for Graph via Rooted Subtrees
Siyuan Huang
Yunchong Song
Jiayue Zhou
Zhouhan Lin
32
8
0
08 Oct 2023
General Graph Random Features
General Graph Random Features
Isaac Reid
Krzysztof Choromanski
Eli Berger
Adrian Weller
23
5
0
07 Oct 2023
Repelling Random Walks
Repelling Random Walks
Isaac Reid
Eli Berger
Krzysztof Choromanski
Adrian Weller
16
4
0
07 Oct 2023
Diffusion Random Feature Model
Diffusion Random Feature Model
Esha Saha
Giang Tran
DiffM
45
0
0
06 Oct 2023
How to Capture Higher-order Correlations? Generalizing Matrix Softmax
  Attention to Kronecker Computation
How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation
Josh Alman
Zhao Song
38
32
0
06 Oct 2023
scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis
  in Brain
scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis in Brain
Gyutaek Oh
B. Choi
Inkyung Jung
Jong Chul Ye
22
5
0
04 Oct 2023
FiGURe: Simple and Efficient Unsupervised Node Representations with
  Filter Augmentations
FiGURe: Simple and Efficient Unsupervised Node Representations with Filter Augmentations
C. Ekbote
Ajinkya Deshpande
Arun Shankar Iyer
Ramakrishna Bairi
Sundararajan Sellamanickam
SSL
44
3
0
03 Oct 2023
SEA: Sparse Linear Attention with Estimated Attention Mask
SEA: Sparse Linear Attention with Estimated Attention Mask
Heejun Lee
Jina Kim
Jeffrey Willette
Sung Ju Hwang
38
6
0
03 Oct 2023
Transformers are efficient hierarchical chemical graph learners
Transformers are efficient hierarchical chemical graph learners
Zihan Pengmei
Zimu Li
Chih-chan Tien
Risi Kondor
Aaron R Dinner
GNN
23
1
0
02 Oct 2023
Robustifying State-space Models for Long Sequences via Approximate
  Diagonalization
Robustifying State-space Models for Long Sequences via Approximate Diagonalization
Annan Yu
Arnur Nigmetov
Dmitriy Morozov
Michael W. Mahoney
N. Benjamin Erichson
27
12
0
02 Oct 2023
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
Praneeth Kacham
Vahab Mirrokni
Peilin Zhong
44
7
0
02 Oct 2023
Win-Win: Training High-Resolution Vision Transformers from Two Windows
Win-Win: Training High-Resolution Vision Transformers from Two Windows
Vincent Leroy
Jérôme Revaud
Thomas Lucas
Philippe Weinzaepfel
ViT
42
2
0
01 Oct 2023
A Survey on Deep Learning Techniques for Action Anticipation
A Survey on Deep Learning Techniques for Action Anticipation
Zeyun Zhong
Manuel Martin
Michael Voit
Juergen Gall
Jürgen Beyerer
26
7
0
29 Sep 2023
Benchmarking and In-depth Performance Study of Large Language Models on
  Habana Gaudi Processors
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors
Chengming Zhang
Baixi Sun
Xiaodong Yu
Zhen Xie
Weijian Zheng
K. Iskra
Pete Beckman
Dingwen Tao
25
4
0
29 Sep 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
34
15
0
28 Sep 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
34
15
0
28 Sep 2023
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme
  Long Sequence Transformer Models
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
S. A. Jacobs
Masahiro Tanaka
Chengming Zhang
Minjia Zhang
L. Song
Samyam Rajbhandari
Yuxiong He
25
103
0
25 Sep 2023
Towards Answering Health-related Questions from Medical Videos: Datasets
  and Approaches
Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches
Deepak Gupta
Kush Attal
Dina Demner-Fushman
LM&MA
27
1
0
21 Sep 2023
Boolformer: Symbolic Regression of Logic Functions with Transformers
Boolformer: Symbolic Regression of Logic Functions with Transformers
Stéphane dÁscoli
Samy Bengio
Josh Susskind
Emmanuel Abbe
19
5
0
21 Sep 2023
Efficient Long-Short Temporal Attention Network for Unsupervised Video
  Object Segmentation
Efficient Long-Short Temporal Attention Network for Unsupervised Video Object Segmentation
P. Li
Yu Zhang
L. Yuan
Huaxin Xiao
Binbin Lin
Xianghua Xu
VOS
26
17
0
21 Sep 2023
Multi-spectral Entropy Constrained Neural Compression of Solar Imagery
Multi-spectral Entropy Constrained Neural Compression of Solar Imagery
Ali Zafari
Atefeh Khoshkhahtinat
P. Mehta
Nasser M. Nasrabadi
Barbara J. Thompson
M. Kirk
D. D. Silva
24
0
0
19 Sep 2023
Deep Prompt Tuning for Graph Transformers
Deep Prompt Tuning for Graph Transformers
Reza Shirkavand
Heng-Chiao Huang
23
7
0
18 Sep 2023
FedGKD: Unleashing the Power of Collaboration in Federated Graph Neural
  Networks
FedGKD: Unleashing the Power of Collaboration in Federated Graph Neural Networks
Qiying Pan
Ruofan Wu
Tengfei Liu
Tianyi Zhang
Yifei Zhu
Weiqiang Wang
FedML
48
9
0
18 Sep 2023
A Data Source for Reasoning Embodied Agents
A Data Source for Reasoning Embodied Agents
Jack Lanchantin
Sainbayar Sukhbaatar
Gabriel Synnaeve
Yuxuan Sun
Kavya Srinet
Arthur Szlam
LM&Ro
LRM
25
5
0
14 Sep 2023
Improved particle-flow event reconstruction with scalable neural
  networks for current and future particle detectors
Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectors
J. Pata
Eric Wulff
Farouk Mokhtar
D. Southwick
Mengke Zhang
M. Girone
Javier Duarte
30
1
0
13 Sep 2023
Uncovering mesa-optimization algorithms in Transformers
Uncovering mesa-optimization algorithms in Transformers
J. Oswald
Eyvind Niklasson
Maximilian Schlegel
Seijin Kobayashi
Nicolas Zucchet
...
Mark Sandler
Blaise Agüera y Arcas
Max Vladymyrov
Razvan Pascanu
João Sacramento
32
54
0
11 Sep 2023
Curve Your Attention: Mixed-Curvature Transformers for Graph
  Representation Learning
Curve Your Attention: Mixed-Curvature Transformers for Graph Representation Learning
Sungjun Cho
Seunghyuk Cho
Sungwoo Park
Hankook Lee
Ho Hin Lee
Moontae Lee
34
6
0
08 Sep 2023
Gated recurrent neural networks discover attention
Gated recurrent neural networks discover attention
Nicolas Zucchet
Seijin Kobayashi
Yassir Akram
J. Oswald
Maxime Larcher
Angelika Steger
João Sacramento
36
8
0
04 Sep 2023
Solving Attention Kernel Regression Problem via Pre-conditioner
Solving Attention Kernel Regression Problem via Pre-conditioner
Zhao Song
Junze Yin
Licheng Zhang
28
10
0
28 Aug 2023
MB-TaylorFormer: Multi-branch Efficient Transformer Expanded by Taylor
  Formula for Image Dehazing
MB-TaylorFormer: Multi-branch Efficient Transformer Expanded by Taylor Formula for Image Dehazing
Yuwei Qiu
Kaihao Zhang
Chenxi Wang
Wenhan Luo
Hongdong Li
Zhi Jin
ViT
39
84
0
27 Aug 2023
Text Matching Improves Sequential Recommendation by Reducing Popularity
  Biases
Text Matching Improves Sequential Recommendation by Reducing Popularity Biases
Zhenghao Liu
Senkun Mei
Chenyan Xiong
Xiaohua Li
Shi Yu
Zhiyuan Liu
Yu Gu
Ge Yu
25
21
0
27 Aug 2023
Computation-efficient Deep Learning for Computer Vision: A Survey
Computation-efficient Deep Learning for Computer Vision: A Survey
Yulin Wang
Yizeng Han
Chaofei Wang
Shiji Song
Qi Tian
Gao Huang
VLM
36
20
0
27 Aug 2023
Eventful Transformers: Leveraging Temporal Redundancy in Vision
  Transformers
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers
Matthew Dutson
Yin Li
M. Gupta
ViT
45
8
0
25 Aug 2023
QKSAN: A Quantum Kernel Self-Attention Network
QKSAN: A Quantum Kernel Self-Attention Network
Ren-Xin Zhao
Jinjing Shi
Xuelong Li
28
20
0
25 Aug 2023
Chunk, Align, Select: A Simple Long-sequence Processing Method for
  Transformers
Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers
Jiawen Xie
Pengyu Cheng
Xiao Liang
Yong Dai
Nan Du
40
7
0
25 Aug 2023
Coarse-to-Fine Multi-Scene Pose Regression with Transformers
Coarse-to-Fine Multi-Scene Pose Regression with Transformers
Yoli Shavit
Ron Ferens
Y. Keller
ViT
39
13
0
22 Aug 2023
Exemplar-Free Continual Transformer with Convolutions
Exemplar-Free Continual Transformer with Convolutions
Anurag Roy
Vinay Kumar Verma
Sravan Voonna
Kripabandhu Ghosh
Saptarshi Ghosh
Abir Das
CLL
BDL
15
10
0
22 Aug 2023
UGSL: A Unified Framework for Benchmarking Graph Structure Learning
UGSL: A Unified Framework for Benchmarking Graph Structure Learning
Bahare Fatemi
Sami Abu-El-Haija
Anton Tsitsulin
Seyed Mehran Kazemi
Dustin Zelle
Neslihan Bulut
Jonathan J. Halcrow
Bryan Perozzi
54
10
0
21 Aug 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
42
3
0
18 Aug 2023
Sparse Binary Transformers for Multivariate Time Series Modeling
Sparse Binary Transformers for Multivariate Time Series Modeling
Matt Gorbett
Hossein Shirazi
I. Ray
AI4TS
32
13
0
09 Aug 2023
RecycleGPT: An Autoregressive Language Model with Recyclable Module
RecycleGPT: An Autoregressive Language Model with Recyclable Module
Yu Jiang
Qiaozhi He
Xiaomin Zhuang
Zhihua Wu
Kunpeng Wang
Wenlai Zhao
Guangwen Yang
KELM
28
3
0
07 Aug 2023
Target-aware Variational Auto-encoders for Ligand Generation with
  Multimodal Protein Representation Learning
Target-aware Variational Auto-encoders for Ligand Generation with Multimodal Protein Representation Learning
Haoxiang Luo
Gang Sun
40
2
0
02 Aug 2023
FLatten Transformer: Vision Transformer using Focused Linear Attention
FLatten Transformer: Vision Transformer using Focused Linear Attention
Dongchen Han
Xuran Pan
Yizeng Han
Shiji Song
Gao Huang
23
156
0
01 Aug 2023
Capturing Co-existing Distortions in User-Generated Content for
  No-reference Video Quality Assessment
Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment
Kun Yuan
Zishang Kong
Chuanchuan Zheng
Ming-Ting Sun
Xingsen Wen
ViT
32
14
0
31 Jul 2023
Previous
123...789...192021
Next