Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.12016
Cited By
Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization
17 June 2024
Seungwoo Son
Wonpyo Park
Woohyun Han
Kyuyeun Kim
Jaeho Lee
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization"
6 / 6 papers shown
Title
Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models
Lucas Maisonnave
Cyril Moineau
Olivier Bichler
Fabrice Rastello
MQ
71
1
0
30 Apr 2025
Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs
Lucas Maisonnave
Cyril Moineau
Olivier Bichler
Fabrice Rastello
MQ
42
0
0
18 Apr 2025
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models
Yanwen Huang
Yong Zhang
Ning Cheng
Zhitao Li
Shaojun Wang
Jing Xiao
86
0
0
02 Jan 2025
House of Cards: Massive Weights in LLMs
Jaehoon Oh
Seungjun Shin
Dokwan Oh
40
1
0
02 Oct 2024
All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality
William Timkey
Marten van Schijndel
215
111
0
09 Sep 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,826
0
17 Sep 2019
1