Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing

1 May 2025

Papers citing "Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing"

1 / 1 papers shown

Title
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free Zihan Qiu Zekun Wang Bo Zheng Zeyu Huang Kaiyue Wen ... Fei Huang Suozhi Huang Dayiheng Liu Jingren Zhou Junyang Lin MoE 28 0 0 10 May 2025