Align Attention Heads Before Merging Them: An Effective Way for Converting MHA to GQA

31 December 2024

Papers citing "Align Attention Heads Before Merging Them: An Effective Way for Converting MHA to GQA"

2 / 2 papers shown

Title
On Pruning State-Space LLMs Tamer Ghattas Michael Hassid Roy Schwartz 85 2 0 26 Feb 2025
SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention Hong Yankun Li Xing Zhen Hui-Ling Yu Xianzhi Liu Wulong Yuan Mingxuan MQ 117 0 0 24 Feb 2025