Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.20526
Cited By
Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders
27 October 2024
Zhengfu He
Wentao Shu
Xuyang Ge
Lingjie Chen
Junxuan Wang
Yunhua Zhou
Frances Liu
Qipeng Guo
Xuanjing Huang
Zuxuan Wu
Yu-Gang Jiang
Xipeng Qiu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders"
7 / 7 papers shown
Title
Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders
Boyi Deng
Boyi Deng
Yidan Zhang
Baosong Yang
Fuli Feng
41
0
0
08 May 2025
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
Zhengfu He
Jiadong Wang
Rui Lin
Xuyang Ge
Wentao Shu
Qiong Tang
J.N. Zhang
Xipeng Qiu
70
0
0
29 Apr 2025
MIB: A Mechanistic Interpretability Benchmark
Aaron Mueller
Atticus Geiger
Sarah Wiegreffe
Dana Arad
Iván Arcuschin
...
Alessandro Stolfo
Martin Tutek
Amir Zur
David Bau
Yonatan Belinkov
51
1
0
17 Apr 2025
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric
Yixin Cao
Jiahao Ying
Yansen Wang
Xipeng Qiu
Xuanjing Huang
Yugang Jiang
ELM
44
2
0
10 Apr 2025
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
Z. He
Haiyan Zhao
Yiran Qiao
Fan Yang
Ali Payani
Jing Ma
Jundong Li
LLMSV
74
2
0
17 Feb 2025
Decomposing The Dark Matter of Sparse Autoencoders
Joshua Engels
Logan Riggs
Max Tegmark
LLMSV
65
10
0
18 Oct 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
82
19
0
02 Jul 2024
1