Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal
Representation

Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation

31 May 2023

Johan A. K. Suykens

Papers citing "Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation"

16 / 16 papers shown

Title
Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain Hyowon Wi Jeongwhan Choi Noseong Park 33 0 0 13 May 2025
A Reproduction Study: The Kernel PCA Interpretation of Self-Attention Fails Under Scrutiny Karahan Sarıtaş Çağatay Yıldız 34 0 0 12 May 2025
Revisiting Kernel Attention with Correlated Gaussian Process Representation Long Minh Bui Tho Tran Huu Duy-Tung Dinh T. Nguyen Trong Nghia Hoang 52 2 0 27 Feb 2025
Mirror Descent on Reproducing Kernel Banach Spaces Akash Kumar Mikhail Belkin Parthe Pandit 43 1 0 18 Nov 2024
Elliptical Attention Stefan K. Nielsen Laziz U. Abdullaev R. Teo Tan M. Nguyen 23 3 0 19 Jun 2024
Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis R. Teo Tan M. Nguyen 45 4 0 19 Jun 2024
WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions Seyedali Mohammadi Edward Raff Jinendra Malekar Vedant Palit Francis Ferraro Manas Gaur AI4MH 47 1 0 17 Jun 2024
Learning in Feature Spaces via Coupled Covariances: Asymmetric Kernel SVD and Nyström method Qinghua Tao F. Tonin Alex Lambert Yingyi Chen Panagiotis Patrinos Johan A. K. Suykens 35 1 0 13 Jun 2024
Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning Fan He Mingzhe He Lei Shi Xiaolin Huang Johan A. K. Suykens 33 1 0 03 Jun 2024
HeNCler: Node Clustering in Heterophilous Graphs through Learned Asymmetric Similarity Sonny Achten F. Tonin V. Cevher Johan A. K. Suykens 51 0 0 27 May 2024
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference Harry Dong Xinyu Yang Zhenyu (Allen) Zhang Zhangyang Wang Yuejie Chi Beidi Chen 29 49 0 14 Feb 2024
Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes Yingyi Chen Qinghua Tao F. Tonin Johan A. K. Suykens 22 1 0 02 Feb 2024
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context Xiang Cheng Yuxin Chen S. Sra 18 35 0 11 Dec 2023
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey Yunpeng Huang Jingwei Xu Junyu Lai Zixu Jiang Taolue Chen ... Xiaoxing Ma Lijuan Yang Zhou Xin Shupeng Li Penghao Zhao LLMAG KELM 36 54 0 21 Nov 2023
Improving Transformers with Probabilistic Attention Keys Tam Nguyen T. Nguyen Dung D. Le Duy Khuong Nguyen Viet-Anh Tran Richard G. Baraniuk Nhat Ho Stanley J. Osher 53 32 0 16 Oct 2021
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 296 39,198 0 01 Sep 2014