Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2412.20677
Cited By
Align Attention Heads Before Merging Them: An Effective Way for Converting MHA to GQA
31 December 2024
Qingyun Jin
Xiaohui Song
Feng Zhou
Zengchang Qin
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Align Attention Heads Before Merging Them: An Effective Way for Converting MHA to GQA"
2 / 2 papers shown
Title
On Pruning State-Space LLMs
Tamer Ghattas
Michael Hassid
Roy Schwartz
85
2
0
26 Feb 2025
SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention
Hong Yankun
Li Xing
Zhen Hui-Ling
Yu Xianzhi
Liu Wulong
Yuan Mingxuan
MQ
117
0
0
24 Feb 2025
1