Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.14963
Cited By
Optimised Grouped-Query Attention Mechanism for Transformers
21 June 2024
Yuang Chen
Cheng Zhang
Xitong Gao
Robert D. Mullins
George A. Constantinides
Yiren Zhao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Optimised Grouped-Query Attention Mechanism for Transformers"
5 / 5 papers shown
Title
Can LLMs reason over extended multilingual contexts? Towards long-context evaluation beyond retrieval and haystacks
Amey Hengle
Prasoon Bajpai
Soham Dan
Tanmoy Chakraborty
LRM
33
0
0
17 Apr 2025
Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs
Nir Ailon
Akhiad Bercovich
Omri Weinstein
57
0
0
15 Mar 2025
SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention
Hong Yankun
Li Xing
Zhen Hui-Ling
Yu Xianzhi
Liu Wulong
Yuan Mingxuan
MQ
80
0
0
24 Feb 2025
Beyond Uniform Query Distribution: Key-Driven Grouped Query Attention
Zohaib Khan
Muhammad Khaquan
Omer Tafveez
Burhanuddin Samiwala
Agha Ali Raza
46
3
0
15 Aug 2024
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
1