ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.06539
  4. Cited By
Deduplicating Training Data Mitigates Privacy Risks in Language Models

Deduplicating Training Data Mitigates Privacy Risks in Language Models

14 February 2022
Nikhil Kandpal
Eric Wallace
Colin Raffel
    PILM
    MU
ArXivPDFHTML

Papers citing "Deduplicating Training Data Mitigates Privacy Risks in Language Models"

50 / 212 papers shown
Title
DMRL: Data- and Model-aware Reward Learning for Data Extraction
DMRL: Data- and Model-aware Reward Learning for Data Extraction
Zhiqiang Wang
Ruoxi Cheng
28
0
0
07 May 2025
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
Jiahao Li
Weijian Ma
Xueyang Li
Yunzhong Lou
G. Zhou
Xiangdong Zhou
34
0
0
07 May 2025
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
Francisco Aguilera-Martínez
Fernando Berzal
PILM
52
0
0
02 May 2025
Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation
Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation
Qianren Mao
Qili Zhang
Hanwen Hao
Zhentao Han
Runhua Xu
...
Bo Li
Y. Song
Jin Dong
Jianxin Li
Philip S. Yu
71
1
0
27 Apr 2025
Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
C. L. P. Chen
Daochang Liu
M. Shah
Chang Xu
64
1
0
25 Apr 2025
PatientDx: Merging Large Language Models for Protecting Data-Privacy in Healthcare
PatientDx: Merging Large Language Models for Protecting Data-Privacy in Healthcare
José G. Moreno
Jesus Lovon
M'Rick Robin-Charlet
Christine Damase-Michel
L. Tamine
MoMe
LM&MA
58
0
0
24 Apr 2025
Memorization: A Close Look at Books
Memorization: A Close Look at Books
Iris Ma
Ian Domingo
A. Krone-Martins
Pierre Baldi
Cristina V. Lopes
29
0
0
17 Apr 2025
Beyond Memorization: Mapping the Originality-Quality Frontier of Language Models
Beyond Memorization: Mapping the Originality-Quality Frontier of Language Models
Vishakh Padmakumar
Chen Yueh-Han
Jane Pan
Valerie Chen
He He
35
0
0
13 Apr 2025
Understanding Users' Security and Privacy Concerns and Attitudes Towards Conversational AI Platforms
Understanding Users' Security and Privacy Concerns and Attitudes Towards Conversational AI Platforms
Mutahar Ali
Arjun Arunasalam
Habiba Farrukh
SILM
54
0
0
09 Apr 2025
Measuring Déjà vu Memorization Efficiently
Measuring Déjà vu Memorization Efficiently
Narine Kokhlikyan
Bargav Jayaraman
Florian Bordes
Chuan Guo
Kamalika Chaudhuri
30
1
0
08 Apr 2025
Language Models May Verbatim Complete Text They Were Not Explicitly Trained On
Language Models May Verbatim Complete Text They Were Not Explicitly Trained On
Ken Ziyu Liu
Christopher A. Choquette-Choo
Matthew Jagielski
Peter Kairouz
Sanmi Koyejo
Percy Liang
Nicolas Papernot
53
0
0
21 Mar 2025
Privacy-Aware RAG: Secure and Isolated Knowledge Retrieval
Privacy-Aware RAG: Secure and Isolated Knowledge Retrieval
Pengcheng Zhou
Yinglun Feng
Zhongliang Yang
SILM
60
0
0
17 Mar 2025
Datasets, Documents, and Repetitions: The Practicalities of Unequal Data Quality
Alex Fang
Hadi Pouransari
Matt Jordan
Alexander Toshev
Vaishaal Shankar
Ludwig Schmidt
Tom Gunter
74
0
0
10 Mar 2025
PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models
Michael-Andrei Panaitescu-Liess
Pankayaraj Pathmanathan
Yigitcan Kaya
Zora Che
Bang An
Sicheng Zhu
Aakriti Agrawal
Furong Huang
AAML
71
0
0
10 Mar 2025
Mitigating Memorization in LLMs using Activation Steering
Manan Suri
Nishit Anand
Amisha Bhaskar
LLMSV
52
2
0
08 Mar 2025
Tokens for Learning, Tokens for Unlearning: Mitigating Membership Inference Attacks in Large Language Models via Dual-Purpose Training
Tokens for Learning, Tokens for Unlearning: Mitigating Membership Inference Attacks in Large Language Models via Dual-Purpose Training
Toan Tran
Ruixuan Liu
Li Xiong
MU
41
0
0
27 Feb 2025
Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models
Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models
Yu He
Boheng Li
L. Liu
Zhongjie Ba
Wei Dong
Yiming Li
Z. Qin
Kui Ren
C. L. P. Chen
MIALM
74
0
0
26 Feb 2025
UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings
UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings
Layba Fiaz
Munief Hassan Tahir
Sana Shams
Sarmad Hussain
49
0
0
24 Feb 2025
A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation
A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation
Shilong Hou
Ruilin Shang
Zi Long
Xianghua Fu
Yin Chen
67
0
0
24 Feb 2025
The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text
The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text
Matthieu Meeus
Lukas Wutschitz
Santiago Zanella Béguelin
Shruti Tople
Reza Shokri
80
0
0
24 Feb 2025
Interrogating LLM design under a fair learning doctrine
Interrogating LLM design under a fair learning doctrine
Johnny Tian-Zheng Wei
Maggie Wang
Ameya Godbole
Jonathan H. Choi
Robin Jia
32
0
0
22 Feb 2025
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
Ivoline Ngong
Swanand Kadhe
Hao Wang
K. Murugesan
Justin D. Weisz
Amit Dhurandhar
K. Ramamurthy
49
2
0
22 Feb 2025
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models
M. Russinovich
Ahmed Salem
MU
CLL
62
0
0
20 Feb 2025
GneissWeb: Preparing High Quality Data for LLMs at Scale
GneissWeb: Preparing High Quality Data for LLMs at Scale
Hajar Emami-Gohari
S. Kadhe
Syed Yousaf Shah. Constantin Adam
Abdulhamid A. Adebayo
Praneet Adusumilli
...
Issei Yoshida
Syed Zawad
Petros Zerfos
Yi Zhou
Bishwaranjan Bhattacharjee
44
1
0
19 Feb 2025
The Vendiscope: An Algorithmic Microscope For Data Collections
The Vendiscope: An Algorithmic Microscope For Data Collections
Amey P. Pasarkar
Adji Bousso Dieng
46
2
0
15 Feb 2025
Data Duplication: A Novel Multi-Purpose Attack Paradigm in Machine Unlearning
Data Duplication: A Novel Multi-Purpose Attack Paradigm in Machine Unlearning
Dayong Ye
Tainqing Zhu
J. Li
Kun Gao
B. Liu
L. Zhang
Wanlei Zhou
Y. Zhang
AAML
MU
80
0
0
28 Jan 2025
Synthetic Data Privacy Metrics
Synthetic Data Privacy Metrics
Amy Steier
Lipika Ramaswamy
Andre Manoel
Alexa Haushalter
43
0
0
08 Jan 2025
Mind the Data Gap: Bridging LLMs to Enterprise Data Integration
Mind the Data Gap: Bridging LLMs to Enterprise Data Integration
Moe Kayali
Fabian Wenz
Nesime Tatbul
Çağatay Demiralp
44
2
0
31 Dec 2024
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Yulei Qin
Yuncheng Yang
Pengcheng Guo
Gang Li
Hang Shao
Yuchen Shi
Zihan Xu
Yun Gu
Ke Li
Xing Sun
ALM
90
12
0
31 Dec 2024
Understanding and Mitigating Memorization in Diffusion Models for
  Tabular Data
Understanding and Mitigating Memorization in Diffusion Models for Tabular Data
Zhengyu Fang
Zhimeng Jiang
Huiyuan Chen
Xiao Li
Jing Li
79
2
0
15 Dec 2024
Accelerating Retrieval-Augmented Generation
Accelerating Retrieval-Augmented Generation
Derrick Quinn
Mohammad Nouri
Neel Patel
John Salihu
Alireza Salemi
Sukhan Lee
Hamed Zamani
Mohammad Alian
RALM
3DV
87
3
0
14 Dec 2024
Copyright-Protected Language Generation via Adaptive Model Fusion
Copyright-Protected Language Generation via Adaptive Model Fusion
Javier Abad
Konstantin Donhauser
Francesco Pinto
Fanny Yang
74
1
0
09 Dec 2024
CODECLEANER: Elevating Standards with A Robust Data Contamination Mitigation Toolkit
Jialun Cao
Songqiang Chen
Wuqi Zhang
Hau Ching Lo
S. Cheung
39
0
0
16 Nov 2024
On Active Privacy Auditing in Supervised Fine-tuning for White-Box
  Language Models
On Active Privacy Auditing in Supervised Fine-tuning for White-Box Language Models
Qian Sun
Hanpeng Wu
Xi Sheryl Zhang
36
0
0
11 Nov 2024
Membership Inference Attacks against Large Vision-Language Models
Membership Inference Attacks against Large Vision-Language Models
Zhan Li
Yongtao Wu
Yihang Chen
F. Tonin
Elias Abad Rocamora
V. Cevher
39
4
0
05 Nov 2024
Do LLMs Know to Respect Copyright Notice?
Do LLMs Know to Respect Copyright Notice?
Jialiang Xu
Shenglan Li
Zhaozhuo Xu
Denghui Zhang
35
2
0
02 Nov 2024
Exploring Local Memorization in Diffusion Models via Bright Ending Attention
Exploring Local Memorization in Diffusion Models via Bright Ending Attention
C. L. P. Chen
Daochang Liu
M. Shah
Chang Xu
62
3
0
29 Oct 2024
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Yujuan Fu
Özlem Uzuner
Meliha Yetisgen
Fei Xia
59
3
0
24 Oct 2024
Self-Comparison for Dataset-Level Membership Inference in Large
  (Vision-)Language Models
Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models
J. Ren
Kangrui Chen
Chen Chen
Vikash Sehwag
Yue Xing
Jiliang Tang
Lingjuan Lyu
26
1
0
16 Oct 2024
A Closer Look at Machine Unlearning for Large Language Models
A Closer Look at Machine Unlearning for Large Language Models
Xiaojian Yuan
Tianyu Pang
Chao Du
Kejiang Chen
Weiming Zhang
Min-Bin Lin
MU
41
5
0
10 Oct 2024
Detecting Training Data of Large Language Models via Expectation Maximization
Detecting Training Data of Large Language Models via Expectation Maximization
Gyuwan Kim
Yang Li
Evangelia Spiliopoulou
Jie Ma
Miguel Ballesteros
William Yang Wang
MIALM
95
4
2
10 Oct 2024
Rescriber: Smaller-LLM-Powered User-Led Data Minimization for LLM-Based Chatbots
Rescriber: Smaller-LLM-Powered User-Led Data Minimization for LLM-Based Chatbots
Jijie Zhou
Eryue Xu
Yaoyao Wu
Tianshi Li
32
0
0
10 Oct 2024
Enhancing Data Quality through Simple De-duplication: Navigating
  Responsible Computational Social Science Research
Enhancing Data Quality through Simple De-duplication: Navigating Responsible Computational Social Science Research
Yida Mu
Mali Jin
Xingyi Song
Nikolaos Aletras
28
0
0
04 Oct 2024
Mitigating Memorization In Language Models
Mitigating Memorization In Language Models
Mansi Sakarvadia
Aswathy Ajith
Arham Khan
Nathaniel Hudson
Caleb Geniesse
Kyle Chard
Yaoqing Yang
Ian Foster
Michael W. Mahoney
KELM
MU
58
0
0
03 Oct 2024
Undesirable Memorization in Large Language Models: A Survey
Undesirable Memorization in Large Language Models: A Survey
Ali Satvaty
Suzan Verberne
Fatih Turkmen
ELM
PILM
71
7
0
03 Oct 2024
Creative Writers' Attitudes on Writing as Training Data for Large
  Language Models
Creative Writers' Attitudes on Writing as Training Data for Large Language Models
Katy Ilonka Gero
Meera Desai
Carly Schnitzler
Nayun Eom
Jack Cushman
Elena L. Glassman
30
1
0
22 Sep 2024
Unlocking Memorization in Large Language Models with Dynamic Soft
  Prompting
Unlocking Memorization in Large Language Models with Dynamic Soft Prompting
Zhepeng Wang
Runxue Bao
Yawen Wu
Jackson Taylor
Cao Xiao
Feng Zheng
Weiwen Jiang
Shangqian Gao
Yanfu Zhang
PILM
39
7
0
20 Sep 2024
MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts
MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts
Tianle Gu
Kexin Huang
Ruilin Luo
Yuanqi Yao
Yujiu Yang
Yan Teng
Yingchun Wang
MU
42
4
0
18 Sep 2024
LLM-PBE: Assessing Data Privacy in Large Language Models
LLM-PBE: Assessing Data Privacy in Large Language Models
Qinbin Li
Junyuan Hong
Chulin Xie
Jeffrey Tan
Rachel Xin
...
Dan Hendrycks
Zhangyang Wang
Bo Li
Bingsheng He
Dawn Song
ELM
PILM
38
12
0
23 Aug 2024
Strong Copyright Protection for Language Models via Adaptive Model
  Fusion
Strong Copyright Protection for Language Models via Adaptive Model Fusion
Javier Abad
Konstantin Donhauser
Francesco Pinto
Fanny Yang
45
4
0
29 Jul 2024
12345
Next