ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.06990
  4. Cited By
BERT Busters: Outlier Dimensions that Disrupt Transformers

BERT Busters: Outlier Dimensions that Disrupt Transformers

14 May 2021
Olga Kovaleva
Saurabh Kulshreshtha
Anna Rogers
Anna Rumshisky
ArXivPDFHTML

Papers citing "BERT Busters: Outlier Dimensions that Disrupt Transformers"

25 / 25 papers shown
Title
Resource-Efficient Language Models: Quantization for Fast and Accessible Inference
Resource-Efficient Language Models: Quantization for Fast and Accessible Inference
Tollef Emil Jørgensen
MQ
56
0
0
13 May 2025
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Haozheng Luo
Chenghao Qiu
Maojiang Su
Zhihan Zhou
Zoe Mehta
Guo Ye
Jerry Yao-Chieh Hu
Han Liu
AAML
55
1
0
01 May 2025
Outlier dimensions favor frequent tokens in language models
Outlier dimensions favor frequent tokens in language models
Iuri Macocco
Nora Graichen
Gemma Boleda
Marco Baroni
58
0
0
27 Mar 2025
Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing
Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing
Qi Le
Enmao Diao
Ziyan Wang
Xinran Wang
Jie Ding
Li Yang
Ali Anwar
84
2
0
24 Feb 2025
ReLU's Revival: On the Entropic Overload in Normalization-Free Large
  Language Models
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models
N. Jha
Brandon Reagen
OffRL
AI4CE
33
0
0
12 Oct 2024
OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition
OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition
Stephen Zhang
Vardan Papyan
VLM
51
1
0
20 Sep 2024
Outlier Reduction with Gated Attention for Improved Post-training
  Quantization in Large Sequence-to-sequence Speech Foundation Models
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
Dominik Wagner
Ilja Baumann
Korbinian Riedhammer
Tobias Bocklet
MQ
43
1
0
16 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
44
3
0
29 May 2024
Cherry on Top: Parameter Heterogeneity and Quantization in Large
  Language Models
Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models
Wanyun Cui
Qianle Wang
MQ
42
2
0
03 Apr 2024
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for
  Pruning LLMs to High Sparsity
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Lu Yin
You Wu
Zhenyu Zhang
Cheng-Yu Hsieh
Yaqing Wang
...
Mykola Pechenizkiy
Yi Liang
Michael Bendersky
Zhangyang Wang
Shiwei Liu
36
79
0
08 Oct 2023
Quantizable Transformers: Removing Outliers by Helping Attention Heads
  Do Nothing
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
23
87
0
22 Jun 2023
A Simple and Effective Pruning Approach for Large Language Models
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
87
361
0
20 Jun 2023
Exploring Anisotropy and Outliers in Multilingual Language Models for
  Cross-Lingual Semantic Sentence Similarity
Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity
Katharina Hämmerl
Alina Fastowski
Jindrich Libovický
Alexander Fraser
30
6
0
01 Jun 2023
Eliciting Latent Predictions from Transformers with the Tuned Lens
Eliciting Latent Predictions from Transformers with the Tuned Lens
Nora Belrose
Zach Furman
Logan Smith
Danny Halawi
Igor V. Ostrovsky
Lev McKinney
Stella Biderman
Jacob Steinhardt
22
196
0
14 Mar 2023
Full Stack Optimization of Transformer Inference: a Survey
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
36
102
0
27 Feb 2023
An Empirical Study on the Transferability of Transformer Modules in
  Parameter-Efficient Fine-Tuning
An Empirical Study on the Transferability of Transformer Modules in Parameter-Efficient Fine-Tuning
Mohammad AkbarTajari
S. Rajaee
Mohammad Taher Pilehvar
19
2
0
01 Feb 2023
The case for 4-bit precision: k-bit Inference Scaling Laws
The case for 4-bit precision: k-bit Inference Scaling Laws
Tim Dettmers
Luke Zettlemoyer
MQ
27
218
0
19 Dec 2022
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language
  Models
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
Xiuying Wei
Yunchen Zhang
Xiangguo Zhang
Ruihao Gong
Shanghang Zhang
Qi Zhang
F. Yu
Xianglong Liu
MQ
38
147
0
27 Sep 2022
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency
Giovanni Puccetti
Anna Rogers
Aleksandr Drozd
F. Dell’Orletta
81
42
0
23 May 2022
Life after BERT: What do Other Muppets Understand about Language?
Life after BERT: What do Other Muppets Understand about Language?
Vladislav Lialin
Kevin Zhao
Namrata Shivagunde
Anna Rumshisky
49
6
0
21 May 2022
Measuring the Mixing of Contextual Information in the Transformer
Measuring the Mixing of Contextual Information in the Transformer
Javier Ferrando
Gerard I. Gállego
Marta R. Costa-jussá
36
50
0
08 Mar 2022
Neural reality of argument structure constructions
Neural reality of argument structure constructions
Bai Li
Zining Zhu
Guillaume Thomas
Frank Rudzicz
Yang Xu
48
27
0
24 Feb 2022
An Isotropy Analysis in the Multilingual BERT Embedding Space
An Isotropy Analysis in the Multilingual BERT Embedding Space
S. Rajaee
Mohammad Taher Pilehvar
24
33
0
09 Oct 2021
The Lottery Ticket Hypothesis for Pre-trained BERT Networks
The Lottery Ticket Hypothesis for Pre-trained BERT Networks
Tianlong Chen
Jonathan Frankle
Shiyu Chang
Sijia Liu
Yang Zhang
Zhangyang Wang
Michael Carbin
156
377
0
23 Jul 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
304
6,996
0
20 Apr 2018
1