ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.08787
  4. Cited By
Rethinking Machine Unlearning for Large Language Models

Rethinking Machine Unlearning for Large Language Models

13 February 2024
Sijia Liu
Yuanshun Yao
Jinghan Jia
Stephen Casper
Nathalie Baracaldo
Peter Hase
Yuguang Yao
Chris Liu
Xiaojun Xu
Hang Li
Kush R. Varshney
Mohit Bansal
Sanmi Koyejo
Yang Liu
    AILaw
    MU
ArXivPDFHTML

Papers citing "Rethinking Machine Unlearning for Large Language Models"

37 / 37 papers shown
Title
WaterDrum: Watermarking for Data-centric Unlearning Metric
WaterDrum: Watermarking for Data-centric Unlearning Metric
Xinyang Lu
Xinyuan Niu
Gregory Kang Ruey Lau
Bui Thi Cam Nhung
Rachael Hwee Ling Sim
Fanyu Wen
Chuan-Sheng Foo
S. Ng
Bryan Kian Hsiang Low
MU
57
0
0
08 May 2025
Retrieval Augmented Generation Evaluation for Health Documents
Retrieval Augmented Generation Evaluation for Health Documents
Mario Ceresa
Lorenzo Bertolini
Valentin Comte
Nicholas Spadaro
Barbara Raffael
...
Sergio Consoli
Amalia Muñoz Piñeiro
Alex Patak
Maddalena Querci
Tobias Wiesenthal
RALM
3DV
39
0
1
07 May 2025
UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning
UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning
Vaidehi Patil
Elias Stengel-Eskin
Mohit Bansal
MU
CLL
75
2
0
20 Feb 2025
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
Zora Che
Stephen Casper
Robert Kirk
Anirudh Satheesh
Stewart Slocum
...
Zikui Cai
Bilal Chughtai
Y. Gal
Furong Huang
Dylan Hadfield-Menell
MU
AAML
ELM
83
3
0
03 Feb 2025
Forget Vectors at Play: Universal Input Perturbations Driving Machine Unlearning in Image Classification
Forget Vectors at Play: Universal Input Perturbations Driving Machine Unlearning in Image Classification
Changchang Sun
Ren Wang
Yihua Zhang
Jinghan Jia
Jiancheng Liu
Gaowen Liu
Sijia Liu
Yan Yan
AAML
MU
93
0
0
21 Dec 2024
Unified Parameter-Efficient Unlearning for LLMs
Chenlu Ding
Jiancan Wu
Yancheng Yuan
Jinda Lu
Kai Zhang
Alex Su
Xiang Wang
Xiangnan He
MU
KELM
100
6
0
30 Nov 2024
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate
Zhiqi Bu
Xiaomeng Jin
Bhanukiran Vinzamuri
Anil Ramakrishna
Kai-Wei Chang
V. Cevher
Mingyi Hong
MU
83
6
0
29 Oct 2024
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
Jinghan Jia
Jiancheng Liu
Yihua Zhang
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MU
35
2
0
23 Oct 2024
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
Jaehong Yoon
Shoubin Yu
Vaidehi Patil
Huaxiu Yao
Mohit Bansal
73
15
0
16 Oct 2024
On Calibration of LLM-based Guard Models for Reliable Content Moderation
On Calibration of LLM-based Guard Models for Reliable Content Moderation
Hongfu Liu
Hengguan Huang
Hao Wang
Xiangming Gu
Ye Wang
53
2
0
14 Oct 2024
Do Unlearning Methods Remove Information from Language Model Weights?
Do Unlearning Methods Remove Information from Language Model Weights?
Aghyad Deeb
Fabien Roger
AAML
MU
40
13
0
11 Oct 2024
A Closer Look at Machine Unlearning for Large Language Models
A Closer Look at Machine Unlearning for Large Language Models
Xiaojian Yuan
Tianyu Pang
Chao Du
Kejiang Chen
Weiming Zhang
Min-Bin Lin
MU
41
5
0
10 Oct 2024
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning
Chongyu Fan
Jiancheng Liu
Licong Lin
Jinghan Jia
Ruiqi Zhang
Song Mei
Sijia Liu
MU
43
16
0
09 Oct 2024
Mitigating Memorization In Language Models
Mitigating Memorization In Language Models
Mansi Sakarvadia
Aswathy Ajith
Arham Khan
Nathaniel Hudson
Caleb Geniesse
Kyle Chard
Yaoqing Yang
Ian Foster
Michael W. Mahoney
KELM
MU
50
0
0
03 Oct 2024
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Pratiksha Thaker
Shengyuan Hu
Neil Kale
Yash Maurya
Zhiwei Steven Wu
Virginia Smith
MU
53
10
0
03 Oct 2024
An Adversarial Perspective on Machine Unlearning for AI Safety
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki
Boyi Wei
Yangsibo Huang
Peter Henderson
F. Tramèr
Javier Rando
MU
AAML
71
31
0
26 Sep 2024
Tamper-Resistant Safeguards for Open-Weight LLMs
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa
Bhrugu Bharathi
Long Phan
Andy Zhou
Alice Gatti
...
Andy Zou
Dawn Song
Bo Li
Dan Hendrycks
Mantas Mazeika
AAML
MU
49
38
0
01 Aug 2024
Strong Copyright Protection for Language Models via Adaptive Model
  Fusion
Strong Copyright Protection for Language Models via Adaptive Model Fusion
Javier Abad
Konstantin Donhauser
Francesco Pinto
Fanny Yang
37
4
0
29 Jul 2024
Composable Interventions for Language Models
Composable Interventions for Language Models
Arinbjorn Kolbeinsson
Kyle O'Brien
Tianjin Huang
Shanghua Gao
Shiwei Liu
...
Anurag J. Vaidya
Faisal Mahmood
Marinka Zitnik
Tianlong Chen
Thomas Hartvigsen
KELM
MU
84
5
0
09 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
79
19
0
02 Jul 2024
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
Somnath Basu Roy Chowdhury
Krzysztof Choromanski
Arijit Sehanobish
Avinava Dubey
Snigdha Chaturvedi
MU
53
7
0
24 Jun 2024
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Tomer Ashuach
Martin Tutek
Yonatan Belinkov
KELM
MU
63
4
0
13 Jun 2024
RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning
  Personal Information in Large Language Models
RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language Models
Bichen Wang
Yuzhe Zi
Yixin Sun
Yanyan Zhao
Bing Qin
MU
67
8
0
04 Jun 2024
What makes unlearning hard and what to do about it
What makes unlearning hard and what to do about it
Kairan Zhao
M. Kurmanji
George-Octavian Barbulescu
Eleni Triantafillou
Peter Triantafillou
MU
58
16
0
03 Jun 2024
Stress-Testing Capability Elicitation With Password-Locked Models
Stress-Testing Capability Elicitation With Password-Locked Models
Ryan Greenblatt
Fabien Roger
Dmitrii Krasheninnikov
David M. Krueger
30
13
0
29 May 2024
Large Scale Knowledge Washing
Large Scale Knowledge Washing
Yu-Xiang Wang
Ruihan Wu
Zexue He
X. Chen
Julian McAuley
MU
KELM
75
5
0
26 May 2024
Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models
Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models
Jiaqi Li
Qianshan Wei
Chuanyi Zhang
Guilin Qi
Miaozeng Du
Yongrui Chen
Sheng Bi
Fan Liu
VLM
MU
67
12
0
21 May 2024
Negative Preference Optimization: From Catastrophic Collapse to
  Effective Unlearning
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
Ruiqi Zhang
Licong Lin
Yu Bai
Song Mei
MU
56
126
0
08 Apr 2024
Threats, Attacks, and Defenses in Machine Unlearning: A Survey
Threats, Attacks, and Defenses in Machine Unlearning: A Survey
Ziyao Liu
Huanyi Ye
Chen Chen
Yongsen Zheng
K. Lam
AAML
MU
29
28
0
20 Mar 2024
Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges
Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges
Nianwen Si
Hao Zhang
Heyu Chang
Wenlin Zhang
Dan Qu
Weiqiang Zhang
KELM
MU
72
26
0
27 Nov 2023
Adversarial Training Should Be Cast as a Non-Zero-Sum Game
Adversarial Training Should Be Cast as a Non-Zero-Sum Game
Alexander Robey
Fabian Latorre
George J. Pappas
Hamed Hassani
V. Cevher
AAML
66
12
0
19 Jun 2023
Boundary Unlearning
Boundary Unlearning
Min Chen
Weizhuo Gao
Gaoyang Liu
Kai Peng
Chen Wang
MU
101
71
0
21 Mar 2023
Knowledge Unlearning for Mitigating Privacy Risks in Language Models
Knowledge Unlearning for Mitigating Privacy Risks in Language Models
Joel Jang
Dongkeun Yoon
Sohee Yang
Sungmin Cha
Moontae Lee
Lajanugen Logeswaran
Minjoon Seo
KELM
PILM
MU
145
190
0
04 Oct 2022
A Survey of Machine Unlearning
A Survey of Machine Unlearning
Thanh Tam Nguyen
T. T. Huynh
Phi Le Nguyen
Alan Wee-Chung Liew
Hongzhi Yin
Quoc Viet Hung Nguyen
MU
77
222
0
06 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
308
11,915
0
04 Mar 2022
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
278
1,812
0
14 Dec 2020
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
Chen Zhu
Yu Cheng
Zhe Gan
S. Sun
Tom Goldstein
Jingjing Liu
AAML
223
437
0
25 Sep 2019
1