Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.13835
Cited By
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
17 October 2024
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"
12 / 12 papers shown
Title
ControlMM: Controllable Masked Motion Generation
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Korrawe Karunratanakul
Pu Wang
Hongfei Xue
Chong Chen
Chuan Guo
Junli Cao
J. Ren
Sergey Tulyakov
VGen
53
7
0
14 Oct 2024
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Liang Chen
Haozhe Zhao
Tianyu Liu
Shuai Bai
Junyang Lin
Chang Zhou
Baobao Chang
MLLM
VLM
82
131
0
11 Mar 2024
IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact
Ruikang Liu
Haoli Bai
Haokun Lin
Yuening Li
Han Gao
Zheng-Jun Xu
Lu Hou
Jun Yao
Chun Yuan
MQ
32
29
0
02 Mar 2024
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Luca Soldaini
Rodney Michael Kinney
Akshita Bhagia
Dustin Schwenk
David Atkinson
...
Hanna Hajishirzi
Iz Beltagy
Dirk Groeneveld
Jesse Dodge
Kyle Lo
63
265
0
31 Jan 2024
The mechanistic basis of data dependence and abrupt learning in an in-context classification task
Gautam Reddy
61
59
0
03 Dec 2023
Function Vectors in Large Language Models
Eric Todd
Millicent Li
Arnab Sen Sharma
Aaron Mueller
Byron C. Wallace
David Bau
46
112
0
23 Oct 2023
Birth of a Transformer: A Memory Viewpoint
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
67
91
0
01 Jun 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
40
74
0
25 May 2023
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers
M. Lewis
Younes Belkada
Luke Zettlemoyer
MQ
64
648
0
15 Aug 2022
Locating and Editing Factual Associations in GPT
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
156
1,308
0
10 Feb 2022
Training with Quantization Noise for Extreme Model Compression
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Hervé Jégou
Armand Joulin
MQ
57
244
0
15 Apr 2020
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
MQ
124
3,090
0
15 Dec 2017
1