ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.08827
  4. Cited By
Do Unlearning Methods Remove Information from Language Model Weights?

Do Unlearning Methods Remove Information from Language Model Weights?

11 October 2024
Aghyad Deeb
Fabien Roger
    AAML
    MU
ArXivPDFHTML

Papers citing "Do Unlearning Methods Remove Information from Language Model Weights?"

10 / 10 papers shown
Title
Layered Unlearning for Adversarial Relearning
Layered Unlearning for Adversarial Relearning
Timothy Qian
Vinith Suriyakumar
Ashia Wilson
Dylan Hadfield-Menell
MU
28
0
0
14 May 2025
Access Controls Will Solve the Dual-Use Dilemma
Access Controls Will Solve the Dual-Use Dilemma
Evžen Wybitul
AAML
26
0
0
14 May 2025
SAEs $\textit{Can}$ Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs
SAEs Can\textit{Can}Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs
Aashiq Muhamed
Jacopo Bonato
Mona Diab
Virginia Smith
MU
66
0
0
11 Apr 2025
Not All Data Are Unlearned Equally
Not All Data Are Unlearned Equally
Aravind Krishnan
Siva Reddy
Marius Mosbach
MU
166
1
0
07 Apr 2025
Exact Unlearning of Finetuning Data via Model Merging at Scale
Exact Unlearning of Finetuning Data via Model Merging at Scale
Kevin Kuo
Amrith Rajagopal Setlur
Kartik Srinivas
Aditi Raghunathan
Virginia Smith
MoMe
CLL
MU
45
0
0
06 Apr 2025
A General Framework to Enhance Fine-tuning-based LLM Unlearning
A General Framework to Enhance Fine-tuning-based LLM Unlearning
J. Ren
Zhenwei Dai
Xianfeng Tang
Hui Liu
Jingying Zeng
...
R. Goutam
Suhang Wang
Yue Xing
Qi He
Hui Liu
MU
163
1
0
25 Feb 2025
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
Zora Che
Stephen Casper
Robert Kirk
Anirudh Satheesh
Stewart Slocum
...
Zikui Cai
Bilal Chughtai
Y. Gal
Furong Huang
Dylan Hadfield-Menell
MU
AAML
ELM
85
3
0
03 Feb 2025
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via
  Mechanistic Localization
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
Phillip Guo
Aaquib Syed
Abhay Sheshadri
Aidan Ewart
Gintare Karolina Dziugaite
KELM
MU
41
5
0
16 Oct 2024
An Adversarial Perspective on Machine Unlearning for AI Safety
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki
Boyi Wei
Yangsibo Huang
Peter Henderson
F. Tramèr
Javier Rando
MU
AAML
73
32
0
26 Sep 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
82
19
0
02 Jul 2024
1