Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.17410
Cited By
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
29 September 2023
Vaidehi Patil
Peter Hase
Joey Tianyi Zhou
KELM
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks"
28 / 78 papers shown
Title
Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces
Yihuai Hong
Lei Yu
Shauli Ravfogel
Haiqin Yang
Mor Geva
KELM
MU
66
18
0
17 Jun 2024
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models
Zhuoran Jin
Pengfei Cao
Chenhao Wang
Zhitao He
Hongbang Yuan
Jiachun Li
Yubo Chen
Kang Liu
Jun Zhao
KELM
MU
42
12
0
16 Jun 2024
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Tomer Ashuach
Martin Tutek
Yonatan Belinkov
KELM
MU
71
4
0
13 Jun 2024
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs
Maciej Besta
Aleš Kubíček
Roman Niggli
Robert Gerstenberger
Lucas Weitzendorf
...
Jürgen Müller
H. Niewiadomski
Marcin Chrapek
Michał Podstawski
Torsten Hoefler
52
15
0
07 Jun 2024
Cross-Modal Safety Alignment: Is textual unlearning all you need?
Trishna Chakraborty
Erfan Shayegani
Zikui Cai
Nael B. Abu-Ghazaleh
M. Salman Asif
Yue Dong
A. Roy-Chowdhury
Chengyu Song
41
16
0
27 May 2024
Offset Unlearning for Large Language Models
James Y. Huang
Wenxuan Zhou
Fei Wang
Fred Morstatter
Sheng Zhang
Hoifung Poon
Muhao Chen
MU
30
14
0
17 Apr 2024
CodeCloak: A Method for Evaluating and Mitigating Code Leakage by LLM Code Assistants
Amit Finkman
Eden Bar-Kochva
Avishag Shapira
D. Mimran
Yuval Elovici
A. Shabtai
ELM
38
1
0
13 Apr 2024
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
Ruiqi Zhang
Licong Lin
Yu Bai
Song Mei
MU
63
128
0
08 Apr 2024
Larimar: Large Language Models with Episodic Memory Control
Payel Das
Subhajit Chaudhury
Elliot Nelson
Igor Melnyk
Sarath Swaminathan
...
Vijil Chenthamarakshan
Jiří
Jirí Navrátil
Soham Dan
Pin-Yu Chen
CLL
KELM
37
18
0
18 Mar 2024
Rebuilding ROME : Resolving Model Collapse during Sequential Model Editing
Akshat Gupta
Sidharth Baskaran
Gopala Anumanchipalli
KELM
65
20
0
11 Mar 2024
On the Societal Impact of Open Foundation Models
Sayash Kapoor
Rishi Bommasani
Kevin Klyman
Shayne Longpre
Ashwin Ramaswami
...
Victor Storchan
Daniel Zhang
Daniel E. Ho
Percy Liang
Arvind Narayanan
26
54
0
27 Feb 2024
Eight Methods to Evaluate Robust Unlearning in LLMs
Aengus Lynch
Phillip Guo
Aidan Ewart
Stephen Casper
Dylan Hadfield-Menell
ELM
MU
40
57
0
26 Feb 2024
Optimizing Language Models for Human Preferences is a Causal Inference Problem
Victoria Lin
Eli Ben-Michael
Louis-Philippe Morency
43
3
0
22 Feb 2024
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Leo Schwinn
David Dobre
Sophie Xhonneux
Gauthier Gidel
Stephan Gunnemann
AAML
51
38
0
14 Feb 2024
Rethinking Machine Unlearning for Large Language Models
Sijia Liu
Yuanshun Yao
Jinghan Jia
Stephen Casper
Nathalie Baracaldo
...
Hang Li
Kush R. Varshney
Mohit Bansal
Sanmi Koyejo
Yang Liu
AILaw
MU
75
83
0
13 Feb 2024
Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models
Lingzhi Wang
Xingshan Zeng
Jinsong Guo
Kam-Fai Wong
Georg Gottlob
MU
AAML
KELM
19
13
0
08 Feb 2024
TOFU: A Task of Fictitious Unlearning for LLMs
Pratyush Maini
Zhili Feng
Avi Schwarzschild
Zachary Chase Lipton
J. Zico Kolter
MU
CLL
38
142
0
11 Jan 2024
A Survey of Text Watermarking in the Era of Large Language Models
Aiwei Liu
Leyi Pan
Yijian Lu
Jingjing Li
Xuming Hu
Xi Zhang
Lijie Wen
Irwin King
Hui Xiong
Philip S. Yu
WaLM
43
51
0
13 Dec 2023
Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications
Zhangyin Feng
Weitao Ma
Weijiang Yu
Lei Huang
Haotian Wang
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
KELM
21
37
0
10 Nov 2023
SoK: Memorization in General-Purpose Large Language Models
Valentin Hartmann
Anshuman Suri
Vincent Bindschaedler
David E. Evans
Shruti Tople
Robert West
KELM
LLMAG
21
20
0
24 Oct 2023
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
Chongyu Fan
Jiancheng Liu
Yihua Zhang
Eric Wong
Dennis Wei
Sijia Liu
MU
27
125
0
19 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
48
42
0
16 Oct 2023
Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense
Kalpesh Krishna
Yixiao Song
Marzena Karpinska
John Wieting
Mohit Iyyer
DeLMO
21
297
0
23 Mar 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
333
12,003
0
04 Mar 2022
Fast Model Editing at Scale
E. Mitchell
Charles Lin
Antoine Bosselut
Chelsea Finn
Christopher D. Manning
KELM
230
343
0
21 Oct 2021
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
226
405
0
24 Feb 2021
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
290
1,815
0
14 Dec 2020
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
419
2,588
0
03 Sep 2019
Previous
1
2