ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.20138
  4. Cited By
DEPN: Detecting and Editing Privacy Neurons in Pretrained Language
  Models

DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models

31 October 2023
Xinwei Wu
Junzhuo Li
Minghui Xu
Weilong Dong
Shuangzhi Wu
Chao Bian
Deyi Xiong
    MU
    KELM
ArXivPDFHTML

Papers citing "DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models"

14 / 14 papers shown
Title
SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning
SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning
Tianyang Xu
Xiaoze Liu
Feijie Wu
Xiaoqian Wang
Jing Gao
MU
61
0
0
29 Mar 2025
Leaking LoRa: An Evaluation of Password Leaks and Knowledge Storage in Large Language Models
Leaking LoRa: An Evaluation of Password Leaks and Knowledge Storage in Large Language Models
Ryan Marinelli
Magnus Eckhoff
PILM
52
0
0
29 Mar 2025
Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility
Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility
Martin Kuo
Jingyang Zhang
Jianyi Zhang
Minxue Tang
Louis DiValentin
...
William Chen
Amin Hass
Tianlong Chen
Yuxiao Chen
Yiming Li
MU
KELM
51
2
0
24 Feb 2025
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
Jinghan Jia
Jiancheng Liu
Yihua Zhang
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MU
35
2
0
23 Oct 2024
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning
Chongyu Fan
Jiancheng Liu
Licong Lin
Jinghan Jia
Ruiqi Zhang
Song Mei
Sijia Liu
MU
43
17
0
09 Oct 2024
An Adversarial Perspective on Machine Unlearning for AI Safety
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki
Boyi Wei
Yangsibo Huang
Peter Henderson
F. Tramèr
Javier Rando
MU
AAML
73
32
0
26 Sep 2024
Tamper-Resistant Safeguards for Open-Weight LLMs
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa
Bhrugu Bharathi
Long Phan
Andy Zhou
Alice Gatti
...
Andy Zou
Dawn Song
Bo Li
Dan Hendrycks
Mantas Mazeika
AAML
MU
53
42
0
01 Aug 2024
In-Context Editing: Learning Knowledge from Self-Induced Distributions
In-Context Editing: Learning Knowledge from Self-Induced Distributions
Siyuan Qi
Bangcheng Yang
Kailin Jiang
Xiaobo Wang
Jiaqi Li
Yifan Zhong
Yaodong Yang
Zilong Zheng
KELM
109
8
0
17 Jun 2024
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Tomer Ashuach
Martin Tutek
Yonatan Belinkov
KELM
MU
71
4
0
13 Jun 2024
Initial Exploration of Zero-Shot Privacy Utility Tradeoffs in Tabular
  Data Using GPT-4
Initial Exploration of Zero-Shot Privacy Utility Tradeoffs in Tabular Data Using GPT-4
Bishwas Mandal
G. Amariucai
Shuangqing Wei
33
1
0
07 Apr 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
34
78
0
25 Jan 2024
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
333
11,953
0
04 Mar 2022
Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics
Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics
Prajjwal Bhargava
Aleksandr Drozd
Anna Rogers
98
101
0
04 Oct 2021
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
290
1,815
0
14 Dec 2020
1