Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2105.06990
Cited By
BERT Busters: Outlier Dimensions that Disrupt Transformers
14 May 2021
Olga Kovaleva
Saurabh Kulshreshtha
Anna Rogers
Anna Rumshisky
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BERT Busters: Outlier Dimensions that Disrupt Transformers"
25 / 25 papers shown
Title
Resource-Efficient Language Models: Quantization for Fast and Accessible Inference
Tollef Emil Jørgensen
MQ
56
0
0
13 May 2025
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Haozheng Luo
Chenghao Qiu
Maojiang Su
Zhihan Zhou
Zoe Mehta
Guo Ye
Jerry Yao-Chieh Hu
Han Liu
AAML
55
1
0
01 May 2025
Outlier dimensions favor frequent tokens in language models
Iuri Macocco
Nora Graichen
Gemma Boleda
Marco Baroni
58
0
0
27 Mar 2025
Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing
Qi Le
Enmao Diao
Ziyan Wang
Xinran Wang
Jie Ding
Li Yang
Ali Anwar
84
2
0
24 Feb 2025
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models
N. Jha
Brandon Reagen
OffRL
AI4CE
33
0
0
12 Oct 2024
OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition
Stephen Zhang
Vardan Papyan
VLM
51
1
0
20 Sep 2024
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
Dominik Wagner
Ilja Baumann
Korbinian Riedhammer
Tobias Bocklet
MQ
43
1
0
16 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
44
3
0
29 May 2024
Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models
Wanyun Cui
Qianle Wang
MQ
42
2
0
03 Apr 2024
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Lu Yin
You Wu
Zhenyu Zhang
Cheng-Yu Hsieh
Yaqing Wang
...
Mykola Pechenizkiy
Yi Liang
Michael Bendersky
Zhangyang Wang
Shiwei Liu
36
79
0
08 Oct 2023
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
23
87
0
22 Jun 2023
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
87
361
0
20 Jun 2023
Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity
Katharina Hämmerl
Alina Fastowski
Jindrich Libovický
Alexander Fraser
30
6
0
01 Jun 2023
Eliciting Latent Predictions from Transformers with the Tuned Lens
Nora Belrose
Zach Furman
Logan Smith
Danny Halawi
Igor V. Ostrovsky
Lev McKinney
Stella Biderman
Jacob Steinhardt
22
196
0
14 Mar 2023
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
36
102
0
27 Feb 2023
An Empirical Study on the Transferability of Transformer Modules in Parameter-Efficient Fine-Tuning
Mohammad AkbarTajari
S. Rajaee
Mohammad Taher Pilehvar
19
2
0
01 Feb 2023
The case for 4-bit precision: k-bit Inference Scaling Laws
Tim Dettmers
Luke Zettlemoyer
MQ
27
218
0
19 Dec 2022
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
Xiuying Wei
Yunchen Zhang
Xiangguo Zhang
Ruihao Gong
Shanghang Zhang
Qi Zhang
F. Yu
Xianglong Liu
MQ
38
147
0
27 Sep 2022
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency
Giovanni Puccetti
Anna Rogers
Aleksandr Drozd
F. Dell’Orletta
81
42
0
23 May 2022
Life after BERT: What do Other Muppets Understand about Language?
Vladislav Lialin
Kevin Zhao
Namrata Shivagunde
Anna Rumshisky
49
6
0
21 May 2022
Measuring the Mixing of Contextual Information in the Transformer
Javier Ferrando
Gerard I. Gállego
Marta R. Costa-jussá
36
50
0
08 Mar 2022
Neural reality of argument structure constructions
Bai Li
Zining Zhu
Guillaume Thomas
Frank Rudzicz
Yang Xu
48
27
0
24 Feb 2022
An Isotropy Analysis in the Multilingual BERT Embedding Space
S. Rajaee
Mohammad Taher Pilehvar
24
33
0
09 Oct 2021
The Lottery Ticket Hypothesis for Pre-trained BERT Networks
Tianlong Chen
Jonathan Frankle
Shiyu Chang
Sijia Liu
Yang Zhang
Zhangyang Wang
Michael Carbin
156
377
0
23 Jul 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
304
6,996
0
20 Apr 2018
1