Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.03329
Cited By
Guardrail Baselines for Unlearning in LLMs
5 March 2024
Pratiksha Thaker
Yash Maurya
Shengyuan Hu
Zhiwei Steven Wu
Virginia Smith
MU
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Guardrail Baselines for Unlearning in LLMs"
14 / 14 papers shown
Title
A General Framework to Enhance Fine-tuning-based LLM Unlearning
J. Ren
Zhenwei Dai
X. Tang
Hui Liu
Jingying Zeng
...
R. Goutam
Suhang Wang
Yue Xing
Qi He
Hui Liu
MU
163
1
0
25 Feb 2025
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
Jinghan Jia
Jiancheng Liu
Yihua Zhang
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MU
35
2
0
23 Oct 2024
A Closer Look at Machine Unlearning for Large Language Models
Xiaojian Yuan
Tianyu Pang
Chao Du
Kejiang Chen
Weiming Zhang
Min-Bin Lin
MU
41
5
0
10 Oct 2024
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning
Chongyu Fan
Jiancheng Liu
Licong Lin
Jinghan Jia
Ruiqi Zhang
Song Mei
Sijia Liu
MU
43
16
0
09 Oct 2024
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Pratiksha Thaker
Shengyuan Hu
Neil Kale
Yash Maurya
Zhiwei Steven Wu
Virginia Smith
MU
53
10
0
03 Oct 2024
Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models
Anmol Mekala
Vineeth Dorna
Shreya Dubey
Abhishek Lalwani
David Koleczek
Mukund Rungta
Sadid Hasan
Elita Lobo
KELM
MU
38
2
0
20 Sep 2024
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models
Bozhong Tian
Xiaozhuan Liang
Siyuan Cheng
Qingbin Liu
Mengru Wang
Dianbo Sui
Xi Chen
Huajun Chen
Ningyu Zhang
MU
27
6
0
02 Jul 2024
SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning
Jinghan Jia
Yihua Zhang
Yimeng Zhang
Jiancheng Liu
Bharat Runwal
James Diffenderfer
B. Kailkhura
Sijia Liu
MU
40
35
0
28 Apr 2024
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
Ruiqi Zhang
Licong Lin
Yu Bai
Song Mei
MU
60
128
0
08 Apr 2024
Testing the Limits of Jailbreaking Defenses with the Purple Problem
Taeyoun Kim
Suhas Kotha
Aditi Raghunathan
AAML
44
6
0
20 Mar 2024
Threats, Attacks, and Defenses in Machine Unlearning: A Survey
Ziyao Liu
Huanyi Ye
Chen Chen
Yongsen Zheng
K. Lam
AAML
MU
35
28
0
20 Mar 2024
Rethinking Machine Unlearning for Large Language Models
Sijia Liu
Yuanshun Yao
Jinghan Jia
Stephen Casper
Nathalie Baracaldo
...
Hang Li
Kush R. Varshney
Mohit Bansal
Sanmi Koyejo
Yang Liu
AILaw
MU
72
83
0
13 Feb 2024
Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges
Nianwen Si
Hao Zhang
Heyu Chang
Wenlin Zhang
Dan Qu
Weiqiang Zhang
KELM
MU
80
26
0
27 Nov 2023
Who's Harry Potter? Approximate Unlearning in LLMs
Ronen Eldan
M. Russinovich
MU
MoMe
101
175
0
03 Oct 2023
1