Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.05869
Cited By
HyperAttention: Long-context Attention in Near-Linear Time
9 October 2023
Insu Han
Rajesh Jayaram
Amin Karbasi
Vahab Mirrokni
David P. Woodruff
A. Zandieh
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HyperAttention: Long-context Attention in Near-Linear Time"
16 / 16 papers shown
Title
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Josh Alman
Zhao Song
20
0
0
17 May 2025
Attention Condensation via Sparsity Induced Regularized Training
Eli Sason
Darya Frolova
Boris Nazarov
Felix Goldberd
237
0
0
03 Mar 2025
Surface Vision Mamba: Leveraging Bidirectional State Space Model for Efficient Spherical Manifold Representation
Rongzhao He
Weihao Zheng
Leilei Zhao
Ying Wang
Dalin Zhu
Dan Wu
Bin Hu
Mamba
95
0
0
21 Feb 2025
Low-Rank Thinning
Annabelle Michael Carrell
Albert Gong
Abhishek Shetty
Raaz Dwivedi
Lester W. Mackey
61
0
0
17 Feb 2025
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
Yang Cao
Zhao Song
Chiwun Yang
VGen
53
2
0
01 Feb 2025
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
96
9
0
11 Jan 2025
Fast Gradient Computation for RoPE Attention in Almost Linear Time
Yifang Chen
Jiayan Huo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
71
12
0
03 Jan 2025
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Yizhao Gao
Zhichen Zeng
Dayou Du
Shijie Cao
Hayden Kwok-Hay So
...
Junjie Lai
Mao Yang
Ting Cao
Fan Yang
M. Yang
52
19
0
17 Oct 2024
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
95
19
0
14 Oct 2024
When big data actually are low-rank, or entrywise approximation of certain function-generated matrices
Stanislav Budzinskiy
70
2
0
03 Jul 2024
Fast Heavy Inner Product Identification Between Weights and Inputs in Neural Network Training
Lianke Qin
Saayan Mitra
Zhao Song
Yuanyuan Yang
Dinesh Manocha
27
0
0
19 Nov 2023
Scaling TabPFN: Sketching and Feature Selection for Tabular Prior-Data Fitted Networks
Ben Feuer
Chinmay Hegde
Niv Cohen
37
10
0
17 Nov 2023
The Expressibility of Polynomial based Attention Scheme
Zhao Song
Guangyi Xu
Junze Yin
34
5
0
30 Oct 2023
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting
Haoyi Zhou
Shanghang Zhang
J. Peng
Shuai Zhang
Jianxin Li
Hui Xiong
Wan Zhang
AI4TS
169
3,900
0
14 Dec 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
288
2,023
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
252
580
0
12 Mar 2020
1