MoH: Multi-Head Attention as Mixture-of-Head Attention

MoH: Multi-Head Attention as Mixture-of-Head Attention

15 October 2024

Papers citing "MoH: Multi-Head Attention as Mixture-of-Head Attention"

10 / 10 papers shown

Title
UMoE: Unifying Attention and FFN with Shared Experts Yuanhang Yang Chaozheng Wang Jing Li MoE 29 0 0 12 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing Piotr Piekos Róbert Csordás Jürgen Schmidhuber MoE VLM 106 1 0 01 May 2025
GMAR: Gradient-Driven Multi-Head Attention Rollout for Vision Transformer Interpretability Sehyeong Jo Gangjae Jang Haesol Park 32 0 0 28 Apr 2025
RouterKT: Mixture-of-Experts for Knowledge Tracing Han Liao Shuaishuai Zu 43 0 0 11 Apr 2025
PVChat: Personalized Video Chat with One-Shot Learning Yufei Shi Weilong Yan Gang Xu Yumeng Li Yong Li ZeLin Li Fei Richard Yu Ming Li Si Yong Yeo 43 0 0 21 Mar 2025
Safe-VAR: Safe Visual Autoregressive Model for Text-to-Image Generative Watermarking Ziyi Wang Songbai Tan Gang Xu Xuerui Qiu Hongbin Xu Xin Meng Ming Li Fei Richard Yu WIGM 63 0 0 14 Mar 2025
WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation Jing Wang Ao Ma Ke Cao Jun Zheng Zhanjie Zhang ... Yuhang Ma Bo Cheng Dawei Leng Yuhui Yin Xiaodan Liang VGen 98 3 0 11 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications Siyuan Mu Sen Lin MoE 159 2 0 10 Mar 2025
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts Peng Jin Bo Zhu Li Yuan Shuicheng Yan MoE 32 4 0 09 Oct 2024
EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images Wangbo Yu Chaoran Feng Jiye Tang Xu Jia Li-ming Yuan Yonghong Tian 56 28 0 29 May 2024