Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.12942
Cited By
v1
v2 (latest)
A3 : an Analytical Low-Rank Approximation Framework for Attention
19 May 2025
Jeffrey T. H. Wong
Cheng Zhang
Xinye Cao
Pedro Gimenes
George A. Constantinides
Wayne Luk
Yiren Zhao
OffRL
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A3 : an Analytical Low-Rank Approximation Framework for Attention"
25 / 25 papers shown
Title
SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression
Xin Wang
Samiul Alam
Zhongwei Wan
Jikang Cheng
Hao Fei
MQ
104
4
0
16 Mar 2025
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
Tao Ji
B. Guo
Y. Wu
Qipeng Guo
Lixing Shen
Zhan Chen
Xipeng Qiu
Qi Zhang
Tao Gui
87
7
0
21 Feb 2025
QERA: an Analytical Framework for Quantization Error Reconstruction
Cheng Zhang
Jeffrey T. H. Wong
Can Xiao
George A. Constantinides
Yiren Zhao
MQ
68
4
0
08 Oct 2024
Round and Round We Go! What makes Rotary Positional Encodings useful?
Federico Barbero
Alex Vitvitskyi
Christos Perivolaropoulos
Razvan Pascanu
Petar Velickovic
121
29
0
08 Oct 2024
Palu: Compressing KV-Cache with Low-Rank Projection
Chi-Chih Chang
Wei-Cheng Lin
Chien-Yu Lin
Chong-Yan Chen
Yu-Fang Hu
Pei-Shuo Wang
N. Huang
Luis Ceze
Kai-Chiang Wu
88
2
0
30 Jul 2024
Compressing Large Language Models using Low Rank and Low Precision Decomposition
R. Saha
Naomi Sagan
Varun Srivastava
Andrea J. Goldsmith
Mert Pilanci
MQ
53
22
0
29 May 2024
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
Xin Wang
Yu Zheng
Zhongwei Wan
Mi Zhang
MQ
139
63
0
12 Mar 2024
LQER: Low-Rank Quantization Error Reconstruction for LLMs
Cheng Zhang
Jianyi Cheng
George A. Constantinides
Yiren Zhao
MQ
69
14
0
04 Feb 2024
ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models
Zhihang Yuan
Yuzhang Shang
Yue Song
Qiang Wu
Yan Yan
Guangyu Sun
MQ
101
61
0
10 Dec 2023
SlimPajama-DC: Understanding Data Combinations for LLM Training
Zhiqiang Shen
Tianhua Tao
Liqun Ma
Willie Neiswanger
Zhengzhong Liu
...
Bowen Tan
Joel Hestness
Natalia Vassilieva
Daria Soboleva
Eric Xing
72
49
0
19 Sep 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
403
12,076
0
18 Jul 2023
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
156
440
0
20 Jun 2023
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Joshua Ainslie
James Lee-Thorp
Michiel de Jong
Yury Zemlyanskiy
Federico Lebrón
Sumit Sanghai
99
695
0
22 May 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
1.5K
13,472
0
27 Feb 2023
Language model compression with weighted low-rank factorization
Yen-Chang Hsu
Ting Hua
Sung-En Chang
Qiang Lou
Yilin Shen
Hongxia Jin
64
108
0
30 Jun 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
850
9,683
0
28 Jan 2022
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
236
5,665
0
07 Jul 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
327
2,533
0
20 Apr 2021
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
889
42,463
0
28 May 2020
GLU Variants Improve Transformer
Noam M. Shazeer
151
1,022
0
12 Feb 2020
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
161
476
0
06 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
490
20,342
0
23 Oct 2019
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
803
132,454
0
12 Jun 2017
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
343
2,900
0
26 Sep 2016
Optimal CUR Matrix Decompositions
Christos Boutsidis
David P. Woodruff
120
188
0
30 May 2014
1