Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.12929
Cited By
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
22 June 2023
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing"
50 / 66 papers shown
Title
Probability Consistency in Large Language Models: Theoretical Foundations Meet Empirical Discrepancies
Xiaoliang Luo
Xinyi Xu
Michael Ramscar
Bradley C. Love
27
0
0
13 May 2025
Resource-Efficient Language Models: Quantization for Fast and Accessible Inference
Tollef Emil Jørgensen
MQ
54
0
0
13 May 2025
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Zihan Qiu
Zekun Wang
Bo Zheng
Zeyu Huang
Kaiyue Wen
...
Fei Huang
Suozhi Huang
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
28
0
0
10 May 2025
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Haozheng Luo
Chenghao Qiu
Maojiang Su
Zhihan Zhou
Zoe Mehta
Guo Ye
Jerry Yao-Chieh Hu
Han Liu
AAML
55
1
0
01 May 2025
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax
Zayd Muhammad Kawakibi Zuhri
Erland Hilman Fuadi
Alham Fikri Aji
33
0
0
29 Apr 2025
Thanos: A Block-wise Pruning Algorithm for Efficient Large Language Model Compression
Ivan Ilin
Peter Richtárik
26
0
0
06 Apr 2025
Outlier dimensions favor frequent tokens in language models
Iuri Macocco
Nora Graichen
Gemma Boleda
Marco Baroni
58
0
0
27 Mar 2025
See What You Are Told: Visual Attention Sink in Large Multimodal Models
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
115
6
0
05 Mar 2025
SpinQuant: LLM quantization with learned rotations
Zechun Liu
Changsheng Zhao
Igor Fedorov
Bilge Soran
Dhruv Choudhary
Raghuraman Krishnamoorthi
Vikas Chandra
Yuandong Tian
Tijmen Blankevoort
MQ
137
84
0
21 Feb 2025
SEAL: Scaling to Emphasize Attention for Long-Context Retrieval
Changhun Lee
Jun-gyu Jin
Younghyun Cho
Eunhyeok Park
LRM
56
0
0
28 Jan 2025
PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization
Mengzhao Chen
Yi Liu
Jiahao Wang
Yi Bin
Wenqi Shao
Ping Luo
MQ
63
2
0
28 Jan 2025
Pushing the Envelope of Low-Bit LLM via Dynamic Error Compensation
Y. Park
Jake Hyun
Hojoon Kim
Jae W. Lee
MQ
46
0
0
31 Dec 2024
Hymba: A Hybrid-head Architecture for Small Language Models
Xin Dong
Y. Fu
Shizhe Diao
Wonmin Byeon
Zijia Chen
...
Min-Hung Chen
Yoshi Suhara
Y. Lin
Jan Kautz
Pavlo Molchanov
Mamba
100
21
0
20 Nov 2024
Multi-Draft Speculative Sampling: Canonical Decomposition and Theoretical Limits
Ashish Khisti
MohammadReza Ebrahimi
Hassan Dbouk
Arash Behboodi
Roland Memisevic
Christos Louizos
33
2
0
23 Oct 2024
Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
Shawn Tan
Yikang Shen
Songlin Yang
Aaron C. Courville
Rameswar Panda
30
4
0
23 Oct 2024
From Attention to Activation: Unravelling the Enigmas of Large Language Models
Prannay Kaul
Chengcheng Ma
Ismail Elezi
Jiankang Deng
28
2
0
22 Oct 2024
Methods of improving LLM training stability
Oleg Rybakov
Mike Chrzanowski
Peter Dykas
Jinze Xue
Ben Lanir
28
1
0
22 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
29
10
0
17 Oct 2024
AERO: Softmax-Only LLMs for Efficient Private Inference
N. Jha
Brandon Reagen
32
1
0
16 Oct 2024
Sorted Weight Sectioning for Energy-Efficient Unstructured Sparse DNNs on Compute-in-Memory Crossbars
Matheus Farias
H. T. Kung
18
1
0
15 Oct 2024
ControlMM: Controllable Masked Motion Generation
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Korrawe Karunratanakul
Pu Wang
Hongfei Xue
Cheng Chen
Chuan Guo
Junli Cao
J. Ren
Sergey Tulyakov
VGen
37
4
0
14 Oct 2024
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models
N. Jha
Brandon Reagen
OffRL
AI4CE
33
0
0
12 Oct 2024
Activation Scaling for Steering and Interpreting Language Models
Niklas Stoehr
Kevin Du
Vésteinn Snæbjarnarson
Robert West
Ryan Cotterell
Aaron Schein
LLMSV
LRM
34
4
0
07 Oct 2024
Differential Transformer
Tianzhu Ye
Li Dong
Yuqing Xia
Yutao Sun
Yi Zhu
Gao Huang
Furu Wei
147
0
0
07 Oct 2024
How Language Models Prioritize Contextual Grammatical Cues?
Hamidreza Amirzadeh
A. Alishahi
Hosein Mohebbi
23
0
0
04 Oct 2024
MCGMark: An Encodable and Robust Online Watermark for Tracing LLM-Generated Malicious Code
Peng Ding
Jingyu Wu
Qingyuan Zhong
Dan Ma
Xunliang Cai
...
Shi Chen
Weizhe Zhang
Zibin Zheng
Weizhe Zhang
Zibin Zheng
48
0
0
02 Aug 2024
Beat this! Accurate beat tracking without DBN postprocessing
Francesco Foscarin
Jan Schluter
Gerhard Widmer
46
5
0
31 Jul 2024
u-
μ
\mu
μ
P: The Unit-Scaled Maximal Update Parametrization
Charlie Blake
C. Eichenberg
Josef Dean
Lukas Balles
Luke Y. Prince
Bjorn Deiseroth
Andres Felipe Cruz Salinas
Carlo Luschi
Samuel Weinbach
Douglas Orr
55
9
0
24 Jul 2024
Mamba-PTQ: Outlier Channels in Recurrent Large Language Models
Alessandro Pierro
Steven Abreu
MQ
Mamba
43
6
0
17 Jul 2024
RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
Xijie Huang
Zechun Liu
Shih-yang Liu
Kwang-Ting Cheng
MQ
40
7
0
10 Jul 2024
How Does Quantization Affect Multilingual LLMs?
Kelly Marchisio
Saurabh Dash
Hongyu Chen
Dennis Aumiller
Ahmet Üstün
Sara Hooker
Sebastian Ruder
MQ
52
8
0
03 Jul 2024
Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization
Seungwoo Son
Wonpyo Park
Woohyun Han
Kyuyeun Kim
Jaeho Lee
MQ
34
10
0
17 Jun 2024
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
Dominik Wagner
Ilja Baumann
K. Riedhammer
Tobias Bocklet
MQ
30
1
0
16 Jun 2024
Optimized Speculative Sampling for GPU Hardware Accelerators
Dominik Wagner
Seanie Lee
Ilja Baumann
Philipp Seeberger
K. Riedhammer
Tobias Bocklet
48
3
0
16 Jun 2024
Low-Rank Quantization-Aware Training for LLMs
Yelysei Bondarenko
Riccardo Del Chiaro
Markus Nagel
MQ
33
10
0
10 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
39
3
0
29 May 2024
Conv-CoA: Improving Open-domain Question Answering in Large Language Models via Conversational Chain-of-Action
Zhenyu Pan
Haozheng Luo
Manling Li
Han Liu
LRM
50
10
0
28 May 2024
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
Jaewoo Yang
Hayun Kim
Younghoon Kim
44
12
0
23 May 2024
eXmY: A Data Type and Technique for Arbitrary Bit Precision Quantization
Aditya Agrawal
Matthew Hedlund
Blake A. Hechtman
MQ
31
4
0
22 May 2024
Explaining Text Similarity in Transformer Models
Alexandros Vasileiou
Oliver Eberle
43
7
0
10 May 2024
LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit
Ruihao Gong
Yang Yong
Shiqiao Gu
Yushi Huang
Chentao Lv
Yunchen Zhang
Xianglong Liu
Dacheng Tao
MQ
39
7
0
09 May 2024
Quantization of Large Language Models with an Overdetermined Basis
D. Merkulov
Daria Cherniuk
Alexander Rudikov
Ivan V. Oseledets
Ekaterina A. Muravleva
A. Mikhalev
Boris Kashin
MQ
34
0
0
15 Apr 2024
Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
Jerry Yao-Chieh Hu
Pei-Hsuan Chang
Haozheng Luo
Hong-Yu Chen
Weijian Li
Wei-Po Wang
Han Liu
39
26
0
04 Apr 2024
Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models
Dennis Wu
Jerry Yao-Chieh Hu
Teng-Yun Hsiao
Han Liu
40
28
0
04 Apr 2024
AffineQuant: Affine Transformation Quantization for Large Language Models
Yuexiao Ma
Huixia Li
Xiawu Zheng
Feng Ling
Xuefeng Xiao
Rui Wang
Shilei Wen
Rongrong Ji
Rongrong Ji
MQ
40
19
0
19 Mar 2024
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning
Jiun-Man Chen
Yu-Hsuan Chao
Yu-Jie Wang
Ming-Der Shieh
Chih-Chung Hsu
Wei-Fen Lin
MQ
34
1
0
11 Mar 2024
What Makes Quantization for Large Language Models Hard? An Empirical Study from the Lens of Perturbation
Zhuocheng Gong
Jiahao Liu
Jingang Wang
Xunliang Cai
Dongyan Zhao
Rui Yan
MQ
32
8
0
11 Mar 2024
IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact
Ruikang Liu
Haoli Bai
Haokun Lin
Yuening Li
Han Gao
Zheng-Jun Xu
Lu Hou
Jun Yao
Chun Yuan
MQ
21
28
0
02 Mar 2024
Evaluating Quantized Large Language Models
Shiyao Li
Xuefei Ning
Luning Wang
Tengxuan Liu
Xiangsheng Shi
Shengen Yan
Guohao Dai
Huazhong Yang
Yu-Xiang Wang
MQ
43
45
0
28 Feb 2024
Massive Activations in Large Language Models
Mingjie Sun
Xinlei Chen
J. Zico Kolter
Zhuang Liu
74
68
0
27 Feb 2024
1
2
Next