ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.14794
  4. Cited By
Rethinking Attention with Performers

Rethinking Attention with Performers

30 September 2020
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
Tamás Sarlós
Peter Hawkins
Jared Davis
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
ArXivPDFHTML

Papers citing "Rethinking Attention with Performers"

50 / 1,014 papers shown
Title
MiniCache: KV Cache Compression in Depth Dimension for Large Language
  Models
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
Akide Liu
Jing Liu
Zizheng Pan
Yefei He
Gholamreza Haffari
Bohan Zhuang
MQ
35
30
0
23 May 2024
Attending to Topological Spaces: The Cellular Transformer
Attending to Topological Spaces: The Cellular Transformer
Rubén Ballester
Pablo Hernández-García
Mathilde Papillon
Claudio Battiloro
Nina Miolane
Tolga Birdal
Carles Casacuberta
Sergio Escalera
Mustafa Hajij
43
3
0
23 May 2024
Dynamic Context Adaptation and Information Flow Control in Transformers:
  Introducing the Evaluator Adjuster Unit and Gated Residual Connections
Dynamic Context Adaptation and Information Flow Control in Transformers: Introducing the Evaluator Adjuster Unit and Gated Residual Connections
Sahil Rajesh Dhayalkar
16
1
0
22 May 2024
Equipping Transformer with Random-Access Reading for Long-Context
  Understanding
Equipping Transformer with Random-Access Reading for Long-Context Understanding
Chenghao Yang
Zi Yang
Nan Hua
32
1
0
21 May 2024
NERULA: A Dual-Pathway Self-Supervised Learning Framework for
  Electrocardiogram Signal Analysis
NERULA: A Dual-Pathway Self-Supervised Learning Framework for Electrocardiogram Signal Analysis
G. Manimaran
S. Puthusserypady
Helena Domínguez
A. Atienza
J. Bardram
30
1
0
21 May 2024
Retrievable Domain-Sensitive Feature Memory for Multi-Domain Recommendation
Retrievable Domain-Sensitive Feature Memory for Multi-Domain Recommendation
Yuang Zhao
Zhaocheng Du
Qinglin Jia
Linxuan Zhang
Zhenhua Dong
Ruiming Tang
38
2
0
21 May 2024
Hierarchical Neural Operator Transformer with Learnable Frequency-aware
  Loss Prior for Arbitrary-scale Super-resolution
Hierarchical Neural Operator Transformer with Learnable Frequency-aware Loss Prior for Arbitrary-scale Super-resolution
Xihaier Luo
Xiaoning Qian
Byung-Jun Yoon
42
3
0
20 May 2024
SLAB: Efficient Transformers with Simplified Linear Attention and
  Progressive Re-parameterized Batch Normalization
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Jialong Guo
Xinghao Chen
Yehui Tang
Yunhe Wang
ViT
49
9
0
19 May 2024
LeaPformer: Enabling Linear Transformers for Autoregressive and
  Simultaneous Tasks via Learned Proportions
LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions
Victor Agostinelli
Sanghyun Hong
Lizhong Chen
KELM
43
1
0
18 May 2024
The CAP Principle for LLM Serving: A Survey of Long-Context Large
  Language Model Serving
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving
Pai Zeng
Zhenyu Ning
Jieru Zhao
Weihao Cui
Mengwei Xu
Liwei Guo
Xusheng Chen
Yizhou Shan
LLMAG
48
4
0
18 May 2024
Positional Knowledge is All You Need: Position-induced Transformer (PiT)
  for Operator Learning
Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator Learning
Junfeng Chen
Kailiang Wu
37
3
0
15 May 2024
Improving Transformers with Dynamically Composable Multi-Head Attention
Improving Transformers with Dynamically Composable Multi-Head Attention
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
26
3
0
14 May 2024
CaFA: Global Weather Forecasting with Factorized Attention on Sphere
CaFA: Global Weather Forecasting with Factorized Attention on Sphere
Zijie Li
Anthony Y. Zhou
Saurabh Patil
A. Farimani
45
6
0
12 May 2024
Length-Aware Multi-Kernel Transformer for Long Document Classification
Length-Aware Multi-Kernel Transformer for Long Document Classification
Guangzeng Han
Jack Tsao
Xiaolei Huang
VLM
RALM
33
4
0
11 May 2024
Linearizing Large Language Models
Linearizing Large Language Models
Jean-Pierre Mercat
Igor Vasiljevic
Sedrick Scott Keh
Kushal Arora
Achal Dave
Adrien Gaidon
Thomas Kollar
40
19
0
10 May 2024
State-Free Inference of State-Space Models: The Transfer Function
  Approach
State-Free Inference of State-Space Models: The Transfer Function Approach
Rom N. Parnichkun
Stefano Massaroli
Alessandro Moro
Jimmy T.H. Smith
Ramin Hasani
...
Hajime Asama
Stefano Ermon
Taiji Suzuki
Atsushi Yamashita
Michael Poli
41
5
0
10 May 2024
IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs
IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs
Yuzhen Mao
Martin Ester
Ke Li
30
6
0
05 May 2024
TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio
  and Bone Conduction Speech Super Resolution and Enhancement on Mobile and
  Wearable Platforms
TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms
Yueyuan Sui
Minghui Zhao
Junxi Xia
Xiaofan Jiang
S. Xia
Mamba
45
11
0
02 May 2024
CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation
CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation
Weiquan Huang
Yifei Shen
Yifan Yang
Mamba
41
4
0
30 Apr 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for
  Long Sequence Modelling: Methods, Applications, and Challenges
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
46
38
0
24 Apr 2024
A Survey on Efficient Inference for Large Language Models
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu-Xiang Wang
46
83
0
22 Apr 2024
LongVQ: Long Sequence Modeling with Vector Quantization on Structured
  Memory
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory
Zicheng Liu
Li Wang
Siyuan Li
Zedong Wang
Haitao Lin
Stan Z. Li
VLM
27
4
0
17 Apr 2024
Comprehensive Survey of Model Compression and Speed up for Vision
  Transformers
Comprehensive Survey of Model Compression and Speed up for Vision Transformers
Feiyang Chen
Ziqian Luo
Lisang Zhou
Xueting Pan
Ying Jiang
18
22
0
16 Apr 2024
Referring Flexible Image Restoration
Referring Flexible Image Restoration
Runwei Guan
Rongsheng Hu
Zhuhao Zhou
Tianlang Xue
Ka Lok Man
Jeremy S. Smith
Eng Gee Lim
Weiping Ding
Yutao Yue
37
0
0
16 Apr 2024
Hierarchical Context Merging: Better Long Context Understanding for
  Pre-trained LLMs
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs
Woomin Song
Seunghyuk Oh
Sangwoo Mo
Jaehyung Kim
Sukmin Yun
Jung-Woo Ha
Jinwoo Shin
30
14
0
16 Apr 2024
Adaptive Patching for High-resolution Image Segmentation with
  Transformers
Adaptive Patching for High-resolution Image Segmentation with Transformers
Enzhi Zhang
Isaac Lyngaas
Peng Chen
Xiao Wang
Jun Igarashi
Yuankai Huo
M. Wahib
M. Munetomo
MedIm
32
1
0
15 Apr 2024
TransformerFAM: Feedback attention is working memory
TransformerFAM: Feedback attention is working memory
Dongseong Hwang
Weiran Wang
Zhuoyuan Huo
K. Sim
P. M. Mengibar
34
12
0
14 Apr 2024
HGRN2: Gated Linear RNNs with State Expansion
HGRN2: Gated Linear RNNs with State Expansion
Zhen Qin
Songlin Yang
Weixuan Sun
Xuyang Shen
Dong Li
Weigao Sun
Yiran Zhong
LRM
47
47
0
11 Apr 2024
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Bo Peng
Daniel Goldstein
Quentin G. Anthony
Alon Albalak
Eric Alcaide
...
Bingchen Zhao
Qihang Zhao
Peng Zhou
Jian Zhu
Ruijie Zhu
51
73
0
08 Apr 2024
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video
  Understanding
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He
Hengduo Li
Young Kyun Jang
Menglin Jia
Xuefei Cao
Ashish Shah
Abhinav Shrivastava
Ser-Nam Lim
MLLM
83
88
0
08 Apr 2024
Lightweight Deep Learning for Resource-Constrained Environments: A
  Survey
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Hou-I Liu
Marco Galindo
Hongxia Xie
Lai-Kuan Wong
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
58
48
0
08 Apr 2024
Bidirectional Long-Range Parser for Sequential Data Understanding
Bidirectional Long-Range Parser for Sequential Data Understanding
George Leotescu
Daniel Voinea
A. Popa
47
1
0
08 Apr 2024
Shortcut-connected Expert Parallelism for Accelerating
  Mixture-of-Experts
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai
Juyong Jiang
Le Qin
Junwei Cui
Sunghun Kim
Jiayi Huang
53
7
0
07 Apr 2024
On the Theoretical Expressive Power and the Design Space of Higher-Order
  Graph Transformers
On the Theoretical Expressive Power and the Design Space of Higher-Order Graph Transformers
Cai Zhou
Rose Yu
Yusu Wang
41
7
0
04 Apr 2024
GP-MoLFormer: A Foundation Model For Molecular Generation
GP-MoLFormer: A Foundation Model For Molecular Generation
Jerret Ross
Brian M. Belgodere
Samuel C. Hoffman
Vijil Chenthamarakshan
Youssef Mroueh
Payel Das
Payel Das
38
5
0
04 Apr 2024
Optimizing the Deployment of Tiny Transformers on Low-Power MCUs
Optimizing the Deployment of Tiny Transformers on Low-Power MCUs
Victor J. B. Jung
Alessio Burrello
Moritz Scherer
Francesco Conti
Luca Benini
30
4
0
03 Apr 2024
Linear Attention Sequence Parallelism
Linear Attention Sequence Parallelism
Weigao Sun
Zhen Qin
Dong Li
Xuyang Shen
Yu Qiao
Yiran Zhong
73
2
0
03 Apr 2024
Scene Adaptive Sparse Transformer for Event-based Object Detection
Scene Adaptive Sparse Transformer for Event-based Object Detection
Yansong Peng
Hebei Li
Yueyi Zhang
Xiaoyan Sun
Feng Wu
ViT
43
12
0
02 Apr 2024
iMD4GC: Incomplete Multimodal Data Integration to Advance Precise
  Treatment Response Prediction and Survival Analysis for Gastric Cancer
iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer
Fengtao Zhou
Ying Xu
Yanfen Cui
Shenyang Zhang
Yun Zhu
...
Louis Ho Shing Lau
Chu Han
Dafu Zhang
Zhenhui Li
Hao Chen
30
1
0
01 Apr 2024
DE-HNN: An effective neural model for Circuit Netlist representation
DE-HNN: An effective neural model for Circuit Netlist representation
Zhishang Luo
Truong Son-Hy
Puoya Tabaghi
Donghyeon Koh
Michael Defferrard
Elahe Rezaei
Ryan Carey
William Rhett Davis
Rajeev Jain
Yusu Wang
16
5
0
30 Mar 2024
DiJiang: Efficient Large Language Models through Compact Kernelization
DiJiang: Efficient Large Language Models through Compact Kernelization
Hanting Chen
Zhicheng Liu
Xutao Wang
Yuchuan Tian
Yunhe Wang
VLM
31
5
0
29 Mar 2024
MambaMixer: Efficient Selective State Space Models with Dual Token and
  Channel Selection
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Ali Behrouz
Michele Santacatterina
Ramin Zabih
44
31
0
29 Mar 2024
NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion,
  Reconstruction, and Generation
NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation
Ruikai Cui
Weizhe Liu
Weixuan Sun
Senbo Wang
Taizhang Shang
...
Han Yan
Zhennan Wu
Shenzhou Chen
Hongdong Li
Pan Ji
56
8
0
27 Mar 2024
Incorporating Exponential Smoothing into MLP: A Simple but Effective
  Sequence Model
Incorporating Exponential Smoothing into MLP: A Simple but Effective Sequence Model
Jiqun Chu
Zuoquan Lin
AI4TS
33
2
0
26 Mar 2024
State Space Models as Foundation Models: A Control Theoretic Overview
State Space Models as Foundation Models: A Control Theoretic Overview
Carmen Amo Alonso
Jerome Sieber
M. Zeilinger
AI4CE
Mamba
36
13
0
25 Mar 2024
Graph Bayesian Optimization for Multiplex Influence Maximization
Graph Bayesian Optimization for Multiplex Influence Maximization
Zirui Yuan
Minglai Shao
Zhiqian Chen
36
6
0
25 Mar 2024
Block Selective Reprogramming for On-device Training of Vision
  Transformers
Block Selective Reprogramming for On-device Training of Vision Transformers
Sreetama Sarkar
Souvik Kundu
Kai Zheng
P. Beerel
37
2
0
25 Mar 2024
ZigMa: A DiT-style Zigzag Mamba Diffusion Model
ZigMa: A DiT-style Zigzag Mamba Diffusion Model
Vincent Tao Hu
S. A. Baumann
Ming Gui
Olga Grebenkova
Pingchuan Ma
Johannes S. Fischer
Bjorn Ommer
42
42
0
20 Mar 2024
TiBiX: Leveraging Temporal Information for Bidirectional X-ray and
  Report Generation
TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation
Santosh Sanjeev
F. Maani
Arsen Abzhanov
Vijay Ram Papineni
Ibrahim Almakky
Bartlomiej W. Papie.z
Mohammad Yaqub
MedIm
58
0
0
20 Mar 2024
Improved EATFormer: A Vision Transformer for Medical Image
  Classification
Improved EATFormer: A Vision Transformer for Medical Image Classification
Yulong Shisu
Susano Mingwin
Yongshuai Wanwag
Zengqiang Chenso
Sunshin Huing
ViT
MedIm
32
0
0
19 Mar 2024
Previous
123456...192021
Next