ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.23270
  4. Cited By
Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs

Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs

29 May 2025
Haokun Chen
Y. Zhang
Yuan Bi
Yao Zhang
Tong Liu
Jinhe Bi
Jian Lan
Jindong Gu
Claudia Grosser
Denis Krompass
Nassir Navab
Volker Tresp
    MU
ArXivPDFHTML

Papers citing "Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs"

12 / 12 papers shown
Title
On Evaluating the Durability of Safeguards for Open-Weight LLMs
On Evaluating the Durability of Safeguards for Open-Weight LLMs
Xiangyu Qi
Boyi Wei
Nicholas Carlini
Yangsibo Huang
Tinghao Xie
Luxi He
Matthew Jagielski
Milad Nasr
Prateek Mittal
Peter Henderson
AAML
99
18
0
10 Dec 2024
Efficient Backdoor Defense in Multimodal Contrastive Learning: A
  Token-Level Unlearning Method for Mitigating Threats
Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats
Kuanrong Liu
Siyuan Liang
Jiawei Liang
Pengwen Dai
Xiaochun Cao
MU
AAML
56
2
0
29 Sep 2024
An Adversarial Perspective on Machine Unlearning for AI Safety
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki
Boyi Wei
Yangsibo Huang
Peter Henderson
F. Tramèr
Javier Rando
MU
AAML
113
38
0
26 Sep 2024
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
Weijia Shi
Jaechan Lee
Yangsibo Huang
Sadhika Malladi
Jieyu Zhao
Ari Holtzman
Daogao Liu
Luke Zettlemoyer
Noah A. Smith
Chiyuan Zhang
MU
ELM
59
54
0
08 Jul 2024
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Tomer Ashuach
Martin Tutek
Yonatan Belinkov
MU
KELM
91
5
0
13 Jun 2024
Eight Methods to Evaluate Robust Unlearning in LLMs
Eight Methods to Evaluate Robust Unlearning in LLMs
Aengus Lynch
Phillip Guo
Aidan Ewart
Stephen Casper
Dylan Hadfield-Menell
ELM
MU
77
67
0
26 Feb 2024
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Leo Schwinn
David Dobre
Sophie Xhonneux
Gauthier Gidel
Stephan Gunnemann
AAML
72
39
0
14 Feb 2024
TOFU: A Task of Fictitious Unlearning for LLMs
TOFU: A Task of Fictitious Unlearning for LLMs
Pratyush Maini
Zhili Feng
Avi Schwarzschild
Zachary Chase Lipton
J. Zico Kolter
MU
CLL
68
165
0
11 Jan 2024
Universal and Transferable Adversarial Attacks on Aligned Language
  Models
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
160
1,376
0
27 Jul 2023
Mass-Editing Memory in a Transformer
Mass-Editing Memory in a Transformer
Kevin Meng
Arnab Sen Sharma
A. Andonian
Yonatan Belinkov
David Bau
KELM
VLM
89
570
0
13 Oct 2022
Locating and Editing Factual Associations in GPT
Locating and Editing Factual Associations in GPT
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
154
1,275
0
10 Feb 2022
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
447
4,662
0
23 Jan 2020
1