ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.14794
  4. Cited By
Rethinking Attention with Performers

Rethinking Attention with Performers

30 September 2020
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
Tamás Sarlós
Peter Hawkins
Jared Davis
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
ArXivPDFHTML

Papers citing "Rethinking Attention with Performers"

50 / 1,014 papers shown
Title
Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
Connor Schenck
Isaac Reid
M. Jacob
Alex Bewley
Joshua Ainslie
...
Matthias Minderer
Dmitry Kalashnikov
Jonathan Tompson
Vikas Sindhwani
Krzysztof Choromanski
66
1
0
04 Feb 2025
Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models
Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models
J. P. Muñoz
Jinjie Yuan
Nilesh Jain
Mamba
72
1
0
28 Jan 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles Ling
Boyu Wang
49
1
0
24 Jan 2025
Episodic Memories Generation and Evaluation Benchmark for Large Language Models
Episodic Memories Generation and Evaluation Benchmark for Large Language Models
Alexis Huet
Zied Ben-Houidi
Dario Rossi
LLMAG
56
0
0
21 Jan 2025
ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
Thibaut Thonet
Jos Rozen
Laurent Besacier
RALM
137
2
0
20 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
105
18
0
17 Jan 2025
Training Hybrid Neural Networks with Multimode Optical Nonlinearities Using Digital Twins
Training Hybrid Neural Networks with Multimode Optical Nonlinearities Using Digital Twins
Ilker Oguz
Louis J. E. Suter
J. Hsieh
Mustafa Yildirim
Niyazi Ulaş Dinç
Christophe Moser
D. Psaltis
58
2
0
14 Jan 2025
MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation
MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation
Daniele Molino
Francesco Di Feola
E. Faiella
Deborah Fazzini
D. Santucci
Linlin Shen
V. Guarrasi
Paolo Soda
SyDa
MedIm
44
0
0
10 Jan 2025
Key-value memory in the brain
Samuel J. Gershman
Ila Fiete
Kazuki Irie
34
7
0
06 Jan 2025
Attention Entropy is a Key Factor: An Analysis of Parallel Context
  Encoding with Full-attention-based Pre-trained Language Models
Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models
Zhisong Zhang
Yan Wang
Xinting Huang
Tianqing Fang
H. Zhang
Chenlong Deng
Shuaiyi Li
Dong Yu
85
2
0
21 Dec 2024
ImagePiece: Content-aware Re-tokenization for Efficient Image
  Recognition
ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition
Seungdong Yoa
Seungjun Lee
Hyeseung Cho
Bumsoo Kim
Woohyung Lim
ViT
70
0
0
21 Dec 2024
Generating Long-form Story Using Dynamic Hierarchical Outlining with
  Memory-Enhancement
Generating Long-form Story Using Dynamic Hierarchical Outlining with Memory-Enhancement
Qianyue Wang
Jinwu Hu
Zhengping Li
Yufeng Wang
daiyuan li
Yu Hu
Mingkui Tan
85
4
0
18 Dec 2024
A Decade of Deep Learning: A Survey on The Magnificent Seven
A Decade of Deep Learning: A Survey on The Magnificent Seven
Dilshod Azizov
Muhammad Arslan Manzoor
Velibor Bojkovic
Yingxu Wang
Zhilin Wang
...
Liang Li
Siwei Liu
Yu Zhong
Wei Liu
Shangsong Liang
OOD
AI4TS
MedIm
120
0
0
13 Dec 2024
Bridging the Divide: Reconsidering Softmax and Linear Attention
Bridging the Divide: Reconsidering Softmax and Linear Attention
Dongchen Han
Yifan Pu
Zhuofan Xia
Yizeng Han
Xuran Pan
Xiu Li
Jiwen Lu
Shiji Song
Gao Huang
73
8
0
09 Dec 2024
Does Self-Attention Need Separate Weights in Transformers?
Md. Kowsher
Nusrat Jahan Prottasha
Chun-Nam Yu
O. Garibay
Niloofar Yousefi
194
0
0
30 Nov 2024
Random Feature Models with Learnable Activation Functions
Random Feature Models with Learnable Activation Functions
Zailin Ma
Jiansheng Yang
Yaodong Yang
75
0
0
29 Nov 2024
Even Sparser Graph Transformers
Even Sparser Graph Transformers
Hamed Shirzad
Honghao Lin
B. Venkatachalam
A. Velingker
David P. Woodruff
Danica J. Sutherland
GNN
96
3
0
25 Nov 2024
CARE Transformer: Mobile-Friendly Linear Visual Transformer via
  Decoupled Dual Interaction
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction
Yuan Zhou
Qingshan Xu
Jiequan Cui
Junbao Zhou
Jing Zhang
Richang Hong
H. Zhang
ViT
78
0
0
25 Nov 2024
Best of Both Worlds: Advantages of Hybrid Graph Sequence Models
Best of Both Worlds: Advantages of Hybrid Graph Sequence Models
Ali Behrouz
Ali Parviz
Mahdi Karami
Clayton Sanford
Bryan Perozzi
Vahab Mirrokni
84
2
0
23 Nov 2024
Nd-BiMamba2: A Unified Bidirectional Architecture for Multi-Dimensional
  Data Processing
Nd-BiMamba2: A Unified Bidirectional Architecture for Multi-Dimensional Data Processing
Hao Liu
Mamba
AI4CE
77
1
0
22 Nov 2024
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
Sanghyeok Lee
Joonmyung Choi
Hyunwoo J. Kim
112
3
0
22 Nov 2024
MemoryFormer: Minimize Transformer Computation by Removing
  Fully-Connected Layers
MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers
Ning Ding
Yehui Tang
Haochen Qin
Zhenli Zhou
Chao Xu
Lin Li
Kai Han
Heng Liao
Yunhe Wang
62
0
0
20 Nov 2024
SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization
Jintao Zhang
Haofeng Huang
Pengle Zhang
Jia wei
Jun-Jie Zhu
Jianfei Chen
VLM
MQ
63
15
0
17 Nov 2024
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Yuhong Chou
Man Yao
Kexin Wang
Yuqi Pan
Ruijie Zhu
Yiran Zhong
Yu Qiao
Jian Wu
Bo Xu
Guoqi Li
54
4
0
16 Nov 2024
Bio-xLSTM: Generative modeling, representation and in-context learning
  of biological and chemical sequences
Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences
Niklas Schmidinger
Lisa Schneckenreiter
Philipp Seidl
Johannes Schimunek
Pieter-Jan Hoedt
Johannes Brandstetter
Andreas Mayr
Sohvi Luukkonen
Sepp Hochreiter
G. Klambauer
MedIm
61
4
0
06 Nov 2024
$k$NN Attention Demystified: A Theoretical Exploration for Scalable
  Transformers
kkkNN Attention Demystified: A Theoretical Exploration for Scalable Transformers
Themistoklis Haris
36
0
0
06 Nov 2024
LASER: Attention with Exponential Transformation
LASER: Attention with Exponential Transformation
Sai Surya Duvvuri
Inderjit Dhillon
35
1
0
05 Nov 2024
Kernel Approximation using Analog In-Memory Computing
Kernel Approximation using Analog In-Memory Computing
Julian Büchel
Giacomo Camposampiero
A. Vasilopoulos
Corey Lammie
Manuel Le Gallo
Abbas Rahimi
Abu Sebastian
55
0
0
05 Nov 2024
The Evolution of RWKV: Advancements in Efficient Language Modeling
The Evolution of RWKV: Advancements in Efficient Language Modeling
Akul Datta
VLM
47
1
0
05 Nov 2024
From Twitter to Reasoner: Understand Mobility Travel Modes and Sentiment
  Using Large Language Models
From Twitter to Reasoner: Understand Mobility Travel Modes and Sentiment Using Large Language Models
Kangrui Ruan
Xinyang Wang
Xuan Di
44
5
0
04 Nov 2024
Training Compute-Optimal Protein Language Models
Training Compute-Optimal Protein Language Models
Xingyi Cheng
Bo Chen
Pan Li
Jing Gong
Jie Tang
Le Song
84
13
0
04 Nov 2024
NIMBA: Towards Robust and Principled Processing of Point Clouds With
  SSMs
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
Nursena Köprücü
Destiny Okpekpe
Antonio Orvieto
Mamba
41
1
0
31 Oct 2024
Dense Associative Memory Through the Lens of Random Features
Dense Associative Memory Through the Lens of Random Features
Benjamin Hoover
Duen Horng Chau
Hendrik Strobelt
Parikshit Ram
Dmitry Krotov
BDL
43
5
0
31 Oct 2024
Video Token Merging for Long-form Video Understanding
Video Token Merging for Long-form Video Understanding
Seon-Ho Lee
Jue Wang
Zhikang Zhang
D. Fan
Xinyu Li
45
5
0
31 Oct 2024
FilterViT and DropoutViT
FilterViT and DropoutViT
Bohang Sun
29
0
0
30 Oct 2024
PESFormer: Boosting Macro- and Micro-expression Spotting with Direct
  Timestamp Encoding
PESFormer: Boosting Macro- and Micro-expression Spotting with Direct Timestamp Encoding
Wang-Wang Yu
Kai-Fu Yang
Xiangrui Hu
Jingwen Jiang
Hong-Mei Yan
Yong-Jie Li
24
0
0
24 Oct 2024
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Jerry Huang
Prasanna Parthasarathi
Mehdi Rezagholizadeh
Boxing Chen
Sarath Chandar
53
0
0
22 Oct 2024
PODTILE: Facilitating Podcast Episode Browsing with Auto-generated
  Chapters
PODTILE: Facilitating Podcast Episode Browsing with Auto-generated Chapters
Azin Ghazimatin
Ekaterina Garmash
Gustavo Penha
Kristen Sheets
Martin Achenbach
...
Ben Carterette
Ann Clifton
Paul N. Bennett
C. Hauff
M. Lalmas
31
2
0
21 Oct 2024
Taming Mambas for Voxel Level 3D Medical Image Segmentation
Taming Mambas for Voxel Level 3D Medical Image Segmentation
Luca Lumetti
Vittorio Pipoli
Kevin Marchesini
Elisa Ficarra
C. Grana
Federico Bolelli
MedIm
Mamba
29
0
0
20 Oct 2024
CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers and
  Fully-Connected Neural Networks for Causally Constrained Predictions
CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers and Fully-Connected Neural Networks for Causally Constrained Predictions
M. Vowels
Mathieu Rochat
S. Akbari
CML
GNN
OOD
27
0
0
18 Oct 2024
GDeR: Safeguarding Efficiency, Balancing, and Robustness via
  Prototypical Graph Pruning
GDeR: Safeguarding Efficiency, Balancing, and Robustness via Prototypical Graph Pruning
Guibin Zhang
Haonan Dong
Yuchen Zhang
Zhixun Li
Dingshuo Chen
Kai Wang
Tianlong Chen
Yuxuan Liang
Dawei Cheng
Kun Wang
32
3
0
17 Oct 2024
Quadratic Gating Functions in Mixture of Experts: A Statistical Insight
Quadratic Gating Functions in Mixture of Experts: A Statistical Insight
Pedram Akbarian
Huy Le Nguyen
Xing Han
Nhat Ho
MoE
42
0
0
15 Oct 2024
SLaNC: Static LayerNorm Calibration
SLaNC: Static LayerNorm Calibration
Mahsa Salmani
Nikita Trukhanov
I. Soloveychik
MQ
26
0
0
14 Oct 2024
Towards Better Multi-head Attention via Channel-wise Sample Permutation
Towards Better Multi-head Attention via Channel-wise Sample Permutation
Shen Yuan
Hongteng Xu
17
1
0
14 Oct 2024
Arrhythmia Classification Using Graph Neural Networks Based on Correlation Matrix
Arrhythmia Classification Using Graph Neural Networks Based on Correlation Matrix
Seungwoo Han
18
0
0
14 Oct 2024
Magnituder Layers for Implicit Neural Representations in 3D
Magnituder Layers for Implicit Neural Representations in 3D
Sang Min Kim
Byeongchan Kim
Arijit Sehanobish
Krzysztof Choromanski
Dongseok Shim
Avinava Dubey
Min Hwan Oh
AI4CE
39
0
0
13 Oct 2024
TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed
  KV Caches for Chunked Text
TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text
Songshuo Lu
Hua Wang
Yutian Rong
Zhi Chen
Yaohua Tang
VLM
31
14
0
10 Oct 2024
Cluster-wise Graph Transformer with Dual-granularity Kernelized
  Attention
Cluster-wise Graph Transformer with Dual-granularity Kernelized Attention
Siyuan Huang
Yunchong Song
Jiayue Zhou
Zhouhan Lin
33
1
0
09 Oct 2024
A Benchmark on Directed Graph Representation Learning in Hardware
  Designs
A Benchmark on Directed Graph Representation Learning in Hardware Designs
Haoyu Wang
Yinan Huang
Nan Wu
Pan Li
OOD
45
1
0
09 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large
  Language Models
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
31
7
0
08 Oct 2024
Previous
12345...192021
Next