ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.06121
  4. Cited By
TOFU: A Task of Fictitious Unlearning for LLMs

TOFU: A Task of Fictitious Unlearning for LLMs

11 January 2024
Pratyush Maini
Zhili Feng
Avi Schwarzschild
Zachary Chase Lipton
J. Zico Kolter
    MUCLL
ArXiv (abs)PDFHTML

Papers citing "TOFU: A Task of Fictitious Unlearning for LLMs"

50 / 71 papers shown
Title
Harry Potter is Still Here! Probing Knowledge Leakage in Targeted Unlearned Large Language Models via Automated Adversarial Prompting
Harry Potter is Still Here! Probing Knowledge Leakage in Targeted Unlearned Large Language Models via Automated Adversarial Prompting
Bang Trinh Tran To
Thai Le
MUKELM
80
1
0
22 May 2025
Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models
Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models
Hwiyeong Lee
Uiji Hwang
Hyelim Lim
Taeuk Kim
MU
80
1
0
22 May 2025
"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding
"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding
Alkis Koudounas
Claudio Savelli
Flavio Giobergia
Elena Baralis
MU
110
0
0
21 May 2025
R-TOFU: Unlearning in Large Reasoning Models
R-TOFU: Unlearning in Large Reasoning Models
Sangyeon Yoon
Wonje Jeung
Albert No
MULRM
202
1
0
21 May 2025
SEPS: A Separability Measure for Robust Unlearning in LLMs
SEPS: A Separability Measure for Robust Unlearning in LLMs
Wonje Jeung
Sangyeon Yoon
Albert No
MUVLM
214
0
0
20 May 2025
Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning
Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning
Puning Yang
Qizhou Wang
Zhuo Huang
Tongliang Liu
Chengqi Zhang
Bo Han
MU
104
0
0
17 May 2025
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
Yiming Du
Wenyu Huang
Danna Zheng
Zhaowei Wang
Sébastien Montella
Mirella Lapata
Kam-Fai Wong
Jeff Z. Pan
KELMMU
205
4
0
01 May 2025
Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation
Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation
Vaidehi Patil
Yi-Lin Sung
Peter Hase
Jie Peng
Jen-tse Huang
Joey Tianyi Zhou
AAMLMU
240
4
0
01 May 2025
Safety Pretraining: Toward the Next Generation of Safe AI
Safety Pretraining: Toward the Next Generation of Safe AI
Pratyush Maini
Sachin Goyal
Dylan Sam
Alex Robey
Yash Savani
Yiding Jiang
Andy Zou
Zacharcy C. Lipton
J. Zico Kolter
185
3
0
23 Apr 2025
Certified Mitigation of Worst-Case LLM Copyright Infringement
Certified Mitigation of Worst-Case LLM Copyright Infringement
Jingyu Zhang
Jiacan Yu
Marc Marone
Benjamin Van Durme
Daniel Khashabi
MoMe
432
0
0
22 Apr 2025
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Hongkang Li
Yihua Zhang
Shuai Zhang
Ming Wang
Sijia Liu
Pin-Yu Chen
MoMe
194
8
0
15 Apr 2025
Not All Data Are Unlearned Equally
Not All Data Are Unlearned Equally
Aravind Krishnan
Siva Reddy
Marius Mosbach
MU
360
2
0
07 Apr 2025
Empirical Privacy Variance
Empirical Privacy Variance
Yuzheng Hu
Fan Wu
Ruicheng Xian
Yuhang Liu
Lydia Zakynthinou
Pritish Kamath
Chiyuan Zhang
David A. Forsyth
123
0
0
16 Mar 2025
SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning
SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning
Junkai Chen
Zhijie Deng
Kening Zheng
Yibo Yan
Shuliang Liu
PeiJun Wu
Peijie Jiang
Qingbin Liu
Xuming Hu
MU
91
7
0
18 Feb 2025
ReLearn: Unlearning via Learning for Large Language Models
ReLearn: Unlearning via Learning for Large Language Models
Haoming Xu
Ningyuan Zhao
Liming Yang
Sendong Zhao
Shumin Deng
Mengru Wang
Bryan Hooi
Nay Oo
Ningyu Zhang
N. Zhang
MUKELMCLL
499
3
0
16 Feb 2025
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate
Zhiqi Bu
Xiaomeng Jin
Bhanukiran Vinzamuri
Anil Ramakrishna
Kai-Wei Chang
Volkan Cevher
Mingyi Hong
MU
137
10
0
29 Oct 2024
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Yujuan Fu
Özlem Uzuner
Meliha Yetisgen
Fei Xia
87
7
0
24 Oct 2024
CLEAR: Character Unlearning in Textual and Visual Modalities
CLEAR: Character Unlearning in Textual and Visual Modalities
Alexey Dontsov
Dmitrii Korzh
Alexey Zhavoronkin
Boris Mikheev
Denis Bobkov
Aibek Alanov
Oleg Y. Rogov
Ivan Oseledets
Elena Tutubalina
MUAILawVLM
122
5
0
23 Oct 2024
SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment
SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment
Qin Liu
Fei Wang
Chaowei Xiao
Muhao Chen
423
2
0
18 Oct 2024
Do Unlearning Methods Remove Information from Language Model Weights?
Do Unlearning Methods Remove Information from Language Model Weights?
Aghyad Deeb
Fabien Roger
AAMLMU
81
25
0
11 Oct 2024
A Closer Look at Machine Unlearning for Large Language Models
A Closer Look at Machine Unlearning for Large Language Models
Xiaojian Yuan
Tianyu Pang
Chao Du
Kejiang Chen
Weiming Zhang
Min Lin
MU
215
11
0
10 Oct 2024
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
Yan Scholten
Stephan Günnemann
Leo Schwinn
MU
126
10
0
04 Oct 2024
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Pratiksha Thaker
Shengyuan Hu
Neil Kale
Yash Maurya
Zhiwei Steven Wu
Virginia Smith
MU
113
19
0
03 Oct 2024
Erasing Conceptual Knowledge from Language Models
Erasing Conceptual Knowledge from Language Models
Rohit Gandikota
Sheridan Feucht
Samuel Marks
David Bau
KELMELMMU
92
11
0
03 Oct 2024
An Adversarial Perspective on Machine Unlearning for AI Safety
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki
Boyi Wei
Yangsibo Huang
Peter Henderson
F. Tramèr
Javier Rando
MUAAML
140
46
0
26 Sep 2024
Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models
Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models
Haoyu Tang
Ye Liu
Xukai Liu
Xukai Liu
Yanghai Zhang
Kai Zhang
Xiaofang Zhou
Enhong Chen
MU
106
3
0
25 Jul 2024
Composable Interventions for Language Models
Composable Interventions for Language Models
Arinbjorn Kolbeinsson
Kyle O'Brien
Tianjin Huang
Shanghua Gao
Shiwei Liu
...
Anurag J. Vaidya
Faisal Mahmood
Marinka Zitnik
Tianlong Chen
Thomas Hartvigsen
KELMMU
153
4
0
09 Jul 2024
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models
Lynn Chua
Badih Ghazi
Yangsibo Huang
Pritish Kamath
Ravi Kumar
Pasin Manurangsi
Amer Sinha
Chulin Xie
Chiyuan Zhang
134
2
0
23 Jun 2024
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Tomer Ashuach
Martin Tutek
Yonatan Belinkov
MUKELM
104
7
0
13 Jun 2024
Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models
Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models
Jiaqi Li
Qianshan Wei
Chuanyi Zhang
Guilin Qi
Miaozeng Du
Yongrui Chen
Sheng Bi
Fan Liu
VLMMU
151
15
0
21 May 2024
Offset Unlearning for Large Language Models
Offset Unlearning for Large Language Models
James Y. Huang
Wenxuan Zhou
Fei Wang
Fred Morstatter
Sheng Zhang
Hoifung Poon
Muhao Chen
MU
77
17
0
17 Apr 2024
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Leo Schwinn
David Dobre
Sophie Xhonneux
Gauthier Gidel
Stephan Gunnemann
AAML
121
44
0
14 Feb 2024
A Comprehensive Study of Knowledge Editing for Large Language Models
A Comprehensive Study of Knowledge Editing for Large Language Models
Ningyu Zhang
Yunzhi Yao
Bo Tian
Peng Wang
Shumin Deng
...
Lei Liang
Qing Cui
Xiao-Jun Zhu
Jun Zhou
Huajun Chen
KELM
88
86
0
02 Jan 2024
MultiDelete for Multimodal Machine Unlearning
MultiDelete for Multimodal Machine Unlearning
Jiali Cheng
Hadi Amiri
MU
93
9
0
18 Nov 2023
Detecting Pretraining Data from Large Language Models
Detecting Pretraining Data from Large Language Models
Weijia Shi
Anirudh Ajith
Mengzhou Xia
Yangsibo Huang
Daogao Liu
Terra Blevins
Danqi Chen
Luke Zettlemoyer
MIALM
73
191
0
25 Oct 2023
In-Context Unlearning: Language Models as Few Shot Unlearners
In-Context Unlearning: Language Models as Few Shot Unlearners
Martin Pawelczyk
Seth Neel
Himabindu Lakkaraju
MU
84
120
0
11 Oct 2023
Who's Harry Potter? Approximate Unlearning in LLMs
Who's Harry Potter? Approximate Unlearning in LLMs
Ronen Eldan
M. Russinovich
MUMoMe
154
206
0
03 Oct 2023
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending
  Against Extraction Attacks
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
Vaidehi Patil
Peter Hase
Joey Tianyi Zhou
KELMAAML
119
106
0
29 Sep 2023
Textbooks Are All You Need II: phi-1.5 technical report
Textbooks Are All You Need II: phi-1.5 technical report
Yuan-Fang Li
Sébastien Bubeck
Ronen Eldan
Allison Del Giorno
Suriya Gunasekar
Yin Tat Lee
ALMLRM
161
474
0
11 Sep 2023
Separate the Wheat from the Chaff: Model Deficiency Unlearning via
  Parameter-Efficient Module Operation
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
Xinshuo Hu
Dongfang Li
Baotian Hu
Zihao Zheng
Zhenyu Liu
Hao Fei
KELMMU
85
30
0
16 Aug 2023
Universal and Transferable Adversarial Attacks on Aligned Language
  Models
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
291
1,455
0
27 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
305
11,894
0
18 Jul 2023
Right to be Forgotten in the Era of Large Language Models: Implications,
  Challenges, and Solutions
Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions
Dawen Zhang
Pamela Finckenberg-Broman
Thong Hoang
Shidong Pan
Zhenchang Xing
Mark Staples
Xiwei Xu
AILawMU
78
53
0
08 Jul 2023
Jailbroken: How Does LLM Safety Training Fail?
Jailbroken: How Does LLM Safety Training Fail?
Alexander Wei
Nika Haghtalab
Jacob Steinhardt
203
970
0
05 Jul 2023
ProPILE: Probing Privacy Leakage in Large Language Models
ProPILE: Probing Privacy Leakage in Large Language Models
Siwon Kim
Sangdoo Yun
Hwaran Lee
Martin Gubri
Sungroh Yoon
Seong Joon Oh
PILM
463
108
3
04 Jul 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
385
3,981
0
29 May 2023
Privacy Auditing with One (1) Training Run
Privacy Auditing with One (1) Training Run
Thomas Steinke
Milad Nasr
Matthew Jagielski
106
82
0
15 May 2023
KGA: A General Machine Unlearning Framework Based on Knowledge Gap
  Alignment
KGA: A General Machine Unlearning Framework Based on Knowledge Gap Alignment
Lingzhi Wang
Tong Chen
Wei Yuan
Xingshan Zeng
Kam-Fai Wong
Hongzhi Yin
MU
76
76
0
11 May 2023
On Provable Copyright Protection for Generative Models
On Provable Copyright Protection for Generative Models
Nikhil Vyas
Sham Kakade
Boaz Barak
68
92
0
21 Feb 2023
Towards Unbounded Machine Unlearning
Towards Unbounded Machine Unlearning
M. Kurmanji
Peter Triantafillou
Jamie Hayes
Eleni Triantafillou
MU
77
142
0
20 Feb 2023
12
Next