Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.13245
Cited By
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
22 May 2023
Joshua Ainslie
James Lee-Thorp
Michiel de Jong
Yury Zemlyanskiy
Federico Lebrón
Sumit Sanghai
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints"
50 / 112 papers shown
Title
GMSA: Enhancing Context Compression via Group Merging and Layer Semantic Alignment
Jiwei Tang
Zhicheng Zhang
Shunlong Wu
Jingheng Ye
Lichen Bai
...
Tingwei Lu
Jiaqi Chen
Lin Hai
Hai-Tao Zheng
Hong-Gee Kim
7
0
0
18 May 2025
Accurate KV Cache Quantization with Outlier Tokens Tracing
Yi Su
Yuechi Zhou
Quantong Qiu
Juntao Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
MQ
22
0
0
16 May 2025
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
C. Jin
Ziheng Jiang
Zhihao Bai
Zheng Zhong
Jing Liu
...
Yanghua Peng
Xuanzhe Liu
Xuanzhe Liu
Xin Jin
Xin Liu
MoE
7
0
0
16 May 2025
Qwen3 Technical Report
An Yang
A. Li
Baosong Yang
Beichen Zhang
Binyuan Hui
...
Zekun Wang
Zeyu Cui
Zhenru Zhang
Zhenhong Zhou
Zihan Qiu
LLMAG
OSLM
LRM
45
0
0
14 May 2025
Accelerating Machine Learning Systems via Category Theory: Applications to Spherical Attention for Gene Regulatory Networks
Vincent Abbott
Kotaro Kamiya
Gerard Glowacki
Yu Atsumi
Gioele Zardini
Yoshihiro Maruyama
29
0
0
14 May 2025
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Zihan Qiu
Zhaoxiang Wang
Bo Zheng
Zeyu Huang
Kaiyue Wen
...
Fei Huang
Suozhi Huang
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
40
0
0
10 May 2025
xGen-small Technical Report
Erik Nijkamp
Bo Pang
Egor Pakhomov
Akash Gokul
Jin Qu
Silvio Savarese
Yingbo Zhou
Caiming Xiong
LLMAG
58
0
0
10 May 2025
Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference
Haolin Zhang
Jeff Huang
35
0
0
09 May 2025
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
Jae-Won Chung
Jiachen Liu
Jeff J. Ma
Ruofan Wu
Oh Jun Kweon
Yuxuan Xia
Zhiyu Wu
Mosharaf Chowdhury
31
0
0
09 May 2025
SPAP: Structured Pruning via Alternating Optimization and Penalty Methods
Hanyu Hu
Xiaoming Yuan
48
0
0
06 May 2025
Bielik 11B v2 Technical Report
Krzysztof Ociepa
Łukasz Flis
Krzysztof Wróbel
Adrian Gwoździej
Remigiusz Kinas
34
0
0
05 May 2025
Bielik v3 Small: Technical Report
Krzysztof Ociepa
Łukasz Flis
Remigiusz Kinas
Krzysztof Wróbel
Adrian Gwoździej
29
0
0
05 May 2025
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
Yushen Chen
J. Zhang
Baotong Lu
Qianxi Zhang
Chengruidong Zhang
...
Chen Chen
Mingxing Zhang
Yuqing Yang
Fan Yang
Mao Yang
38
0
0
05 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
106
1
0
01 May 2025
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
Zhengfu He
Jingbo Wang
Rui Lin
Xuyang Ge
Wentao Shu
Qiong Tang
J.N. Zhang
Xipeng Qiu
70
0
0
29 Apr 2025
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
A. Zandieh
Majid Daliri
Majid Hadian
Vahab Mirrokni
MQ
74
0
0
28 Apr 2025
The Rise of Small Language Models in Healthcare: A Comprehensive Survey
Muskan Garg
Shaina Raza
Shebuti Rayana
Xingyi Liu
Sunghwan Sohn
LM&MA
AILaw
92
0
0
23 Apr 2025
How Effective are Generative Large Language Models in Performing Requirements Classification?
Waad Alhoshan
Alessio Ferrari
Liping Zhao
30
0
0
23 Apr 2025
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
Zihao An
Huajun Bai
Ziqiang Liu
Dong Li
E. Barsoum
61
0
0
23 Apr 2025
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
Junyoung Park
Dalton Jones
Matthew J Morse
Raghavv Goel
Mingu Lee
Chris Lott
27
0
0
21 Apr 2025
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
171
0
0
21 Apr 2025
BRepFormer: Transformer-Based B-rep Geometric Feature Recognition
Yongkang Dai
Xiaoshui Huang
Yunpeng Bai
Hao Guo
Hongping Gan
Ling Yang
Yilei Shi
49
0
0
10 Apr 2025
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
Kazuki Yano
Takumi Ito
Jun Suzuki
LRM
52
1
0
05 Apr 2025
Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs
Pedro Sandoval-Segura
Xijun Wang
Ashwinee Panda
Micah Goldblum
Ronen Basri
Tom Goldstein
David Jacobs
22
0
0
04 Apr 2025
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Hung-Yueh Chiang
Chi-chih Chang
N. Frumkin
Kai-Chiang Wu
Mohamed S. Abdelfattah
Diana Marculescu
MQ
187
0
0
28 Mar 2025
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
Minsu Kim
Seongmin Hong
RyeoWook Ko
S. Choi
Hunjong Lee
Junsoo Kim
Joo-Young Kim
Jongse Park
57
0
0
24 Mar 2025
WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference
Youhui Zuo
Sibo Wei
C. Zhang
Zhuorui Liu
Wenpeng Lu
Dawei Song
VLM
61
0
0
23 Mar 2025
GPU-Accelerated Motion Planning of an Underactuated Forestry Crane in Cluttered Environments
M. Vu
Gerald Ebmer
Alexander Watcher
Marc-Philip Ecker
Giang Nguyen
Tobias Glueck
77
2
0
18 Mar 2025
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
Guihong Li
Mehdi Rezagholizadeh
Mingyu Yang
Vikram Appia
Emad Barsoum
VLM
60
0
0
14 Mar 2025
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Neusha Javidnia
B. Rouhani
F. Koushanfar
184
0
0
14 Mar 2025
Cost-Optimal Grouped-Query Attention for Long-Context Modeling
Yuxiao Chen
Yutong Wu
Chenyang Song
Zhiyuan Liu
Maosong Sun
Xu Han
Zhiyuan Liu
Maosong Sun
73
0
0
12 Mar 2025
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation
Yingfeng Luo
Tong Zheng
Yongyu Mu
Yangqiu Song
Qinghong Zhang
...
Ziqiang Xu
Peinan Feng
Xiaoqian Liu
Tong Xiao
Jingbo Zhu
AI4CE
209
0
0
09 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu-Xi Cheng
MoE
103
2
0
07 Mar 2025
EuroBERT: Scaling Multilingual Encoders for European Languages
Nicolas Boizard
Hippolyte Gisserot-Boukhlef
Duarte M. Alves
André F. T. Martins
Ayoub Hammal
...
Maxime Peyrard
Nuno M. Guerreiro
Patrick Fernandes
Ricardo Rei
Pierre Colombo
158
1
0
07 Mar 2025
MoSE: Hierarchical Self-Distillation Enhances Early Layer Embeddings
Andrea Gurioli
Federico Pennino
João Monteiro
Maurizio Gabbrielli
51
0
0
04 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Wenjie Qu
Xiren Zhou
MoE
SyDa
76
28
0
03 Mar 2025
Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective
Yuko Nakagi
Keigo Tada
Sota Yoshino
Shinji Nishimoto
Yu Takagi
LRM
37
0
0
28 Feb 2025
FANformer: Improving Large Language Models Through Effective Periodicity Modeling
Yihong Dong
Ge Li
Xue Jiang
Yongding Tao
Kechi Zhang
...
Huanyu Liu
Jiazheng Ding
Jia Li
Jinliang Deng
Hong Mei
AI4TS
46
0
0
28 Feb 2025
Kanana: Compute-efficient Bilingual Language Models
Kanana LLM Team
Yunju Bak
Hojin Lee
Minho Ryu
Jiyeon Ham
...
Daniel Lee
Minchul Lee
M. Lee
Shinbok Lee
Gaeun Seo
98
1
0
26 Feb 2025
Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps
Yen-Che Hsiao
Abhishek Dutta
LRM
ReLM
ELM
66
0
0
24 Feb 2025
Smaller But Better: Unifying Layout Generation with Smaller Large Language Models
Peirong Zhang
Jiaxin Zhang
Jiahuan Cao
Hongliang Li
Lianwen Jin
57
0
0
21 Feb 2025
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
Longxu Dou
Qian Liu
Fan Zhou
Changyu Chen
Zili Wang
...
Tianyu Pang
Chao Du
Xinyi Wan
Wei Lu
Min Lin
121
1
0
18 Feb 2025
Understanding Silent Data Corruption in LLM Training
Jeffrey Ma
Hengzhi Pei
Leonard Lausen
George Karypis
42
0
0
17 Feb 2025
Efficient Long-Decoding Inference with Reasoning-Aware Attention Sparsity
Junhao Hu
Wenrui Huang
Weidong Wang
Zhenwen Li
Tiancheng Hu
Zhixia Liu
Xusheng Chen
Tao Xie
Yizhou Shan
LRM
51
0
0
16 Feb 2025
Enhanced Retrieval of Long Documents: Leveraging Fine-Grained Block Representations with Large Language Models
Minghan Li
Eric Gaussier
Guodong Zhou
RALM
68
0
0
28 Jan 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
79
1
0
23 Jan 2025
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
90
9
0
11 Jan 2025
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Di Liu
Meng Chen
Baotong Lu
Huiqiang Jiang
Zhenhua Han
...
Kaipeng Zhang
Cheng Chen
Fan Yang
Yuqing Yang
Lili Qiu
60
30
0
03 Jan 2025
LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System
Hyucksung Kwon
Kyungmo Koo
Janghyeon Kim
W. Lee
Minjae Lee
...
Yongkee Kwon
Ilkon Kim
Euicheol Lim
John Kim
Jungwook Choi
74
4
0
28 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Junge Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Fei Wu
G. Wang
Eduard H. Hovy
OffRL
134
7
0
05 Dec 2024
1
2
3
Next