Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.19278
Cited By
Applying sparse autoencoders to unlearn knowledge in language models
25 October 2024
Eoin Farrell
Yeu-Tong Lau
Arthur Conmy
MU
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Applying sparse autoencoders to unlearn knowledge in language models"
5 / 5 papers shown
Title
Are Sparse Autoencoders Useful for Java Function Bug Detection?
Rui Melo
Claudia Mamede
Andre Catarino
Rui Abreu
Henrique Lopes Cardoso
31
0
0
15 May 2025
Understanding the Repeat Curse in Large Language Models from a Feature Perspective
Junchi Yao
Shu Yang
Jianhua Xu
Lijie Hu
Mengdi Li
Di Wang
27
0
0
19 Apr 2025
Steering off Course: Reliability Challenges in Steering Language Models
Patrick Queiroz Da Silva
Hari Sethuraman
Dheeraj Rajagopal
Hannaneh Hajishirzi
Sachin Kumar
LLMSV
37
1
0
06 Apr 2025
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
Bartosz Cywiñski
Kamil Deja
DiffM
63
6
0
29 Jan 2025
Tracking the Feature Dynamics in LLM Training: A Mechanistic Study
Yang Xu
Yansen Wang
Hao Wang
165
1
0
23 Dec 2024
1