Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.06539
Cited By
Deduplicating Training Data Mitigates Privacy Risks in Language Models
14 February 2022
Nikhil Kandpal
Eric Wallace
Colin Raffel
PILM
MU
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deduplicating Training Data Mitigates Privacy Risks in Language Models"
50 / 212 papers shown
Title
Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models
Haoyu Tang
Ye Liu
Xukai Liu
Xukai Liu
Yanghai Zhang
Kai Zhang
Xiaofang Zhou
Enhong Chen
MU
72
3
0
25 Jul 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
54
10
0
20 Jul 2024
Training Foundation Models as Data Compression: On Information, Model Weights and Copyright Law
Giorgio Franceschelli
Claudia Cevenini
Mirco Musolesi
44
0
0
18 Jul 2024
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs
Zhenhua Liu
Tong Zhu
Chuanyuan Tan
Wenliang Chen
PILM
MU
50
8
0
14 Jul 2024
Extracting Training Data from Document-Based VQA Models
Francesco Pinto
N. Rauschmayr
F. Tramèr
Philip H. S. Torr
Federico Tombari
34
3
0
11 Jul 2024
Differentially Private Neural Network Training under Hidden State Assumption
Ding Chen
Chen Liu
FedML
29
0
0
11 Jul 2024
Privacy-Preserving Data Deduplication for Enhancing Federated Learning of Language Models
Aydin Abadi
Vishnu Asutosh Dasu
Sumanta Sarkar
43
3
0
11 Jul 2024
Data, Data Everywhere: A Guide for Pretraining Dataset Construction
Jupinder Parmar
Shrimai Prabhumoye
Joseph Jennings
Bo Liu
Aastha Jhunjhunwala
Zhilin Wang
M. Patwary
M. Shoeybi
Bryan Catanzaro
50
5
0
08 Jul 2024
e-Health CSIRO at "Discharge Me!" 2024: Generating Discharge Summary Sections with Fine-tuned Language Models
Jinghui Liu
Aaron Nicolson
Jason Dowling
Bevan Koopman
Anthony N. Nguyen
35
5
0
03 Jul 2024
Towards More Realistic Extraction Attacks: An Adversarial Perspective
Yash More
Prakhar Ganesh
G. Farnadi
AAML
71
6
0
02 Jul 2024
Natural Language but Omitted? On the Ineffectiveness of Large Language Models' privacy policy from End-users' Perspective
Shuning Zhang
Haobin Xing
Xin Yi
Hewu Li
PILM
48
0
0
26 Jun 2024
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Guilherme Penedo
Hynek Kydlícek
Loubna Ben Allal
Anton Lozhkov
Margaret Mitchell
Colin Raffel
Leandro von Werra
Thomas Wolf
43
189
0
25 Jun 2024
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
USVSN Sai Prashanth
Alvin Deng
Kyle O'Brien
Jyothir S V
Mohammad Aflah Khan
...
Jacob Ray Fuehne
Stella Biderman
Tracy Ke
Katherine Lee
Naomi Saphra
60
12
0
25 Jun 2024
ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods
Roy Xie
Junlin Wang
Ruomin Huang
Minxing Zhang
Rong Ge
Jian Pei
Neil Zhenqiang Gong
Bhuwan Dhingra
MIALM
45
11
0
23 Jun 2024
Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Frontier AI Models
Sunny Duan
Mikail Khona
Abhiram Iyer
Rylan Schaeffer
Ila R Fiete
50
3
0
20 Jun 2024
Protecting Privacy Through Approximating Optimal Parameters for Sequence Unlearning in Language Models
Dohyun Lee
Daniel Rim
Minseok Choi
Jaegul Choo
PILM
MU
57
4
0
20 Jun 2024
Evaluating
n
n
n
-Gram Novelty of Language Models Using Rusty-DAWG
William Merrill
Noah A. Smith
Yanai Elazar
ELM
TDI
44
9
0
18 Jun 2024
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
Yujie Lu
Dongfu Jiang
Wenhu Chen
William Yang Wang
Yejin Choi
Bill Yuchen Lin
VLM
45
26
0
16 Jun 2024
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs
Abhimanyu Hans
Yuxin Wen
Neel Jain
John Kirchenbauer
Hamid Kazemi
...
Siddharth Singh
Gowthami Somepalli
Jonas Geiping
A. Bhatele
Tom Goldstein
36
31
0
14 Jun 2024
Newswire: A Large-Scale Structured Database of a Century of Historical News
Emily Silcock
Abhishek Arora
Luca DÁmico-Wong
Melissa Dell
AI4TS
GNN
37
3
0
13 Jun 2024
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Tomer Ashuach
Martin Tutek
Yonatan Belinkov
KELM
MU
71
4
0
13 Jun 2024
Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey
Shang Wang
Tianqing Zhu
Bo Liu
Ming Ding
Xu Guo
Dayong Ye
Wanlei Zhou
Philip S. Yu
PILM
67
17
0
12 Jun 2024
On the Minimal Degree Bias in Generalization on the Unseen for non-Boolean Functions
Denys Pushkin
Raphael Berthier
Emmanuel Abbe
32
0
0
10 Jun 2024
Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas
Chengyuan Deng
Yiqun Duan
Xin Jin
Heng Chang
Yijun Tian
...
Kuofeng Gao
Sihong He
Jun Zhuang
Lu Cheng
Haohan Wang
AILaw
43
16
0
08 Jun 2024
Causal Estimation of Memorisation Profiles
Pietro Lesci
Clara Meister
Thomas Hofmann
Andreas Vlachos
Tiago Pimentel
45
5
0
06 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
38
39
0
06 Jun 2024
Reconstructing training data from document understanding models
Jérémie Dentan
Arnaud Paran
A. Shabou
AAML
SyDa
41
1
0
05 Jun 2024
Privacy-Aware Visual Language Models
Laurens Samson
Nimrod Barazani
S. Ghebreab
Yukiyasu Asano
PILM
VLM
44
1
0
27 May 2024
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping-Chia Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
49
36
0
26 May 2024
GECKO: Generative Language Model for English, Code and Korean
Sungwoo Oh
Donggyu Kim
VLM
27
0
0
24 May 2024
The Mosaic Memory of Large Language Models
Igor Shilov
Matthieu Meeus
Yves-Alexandre de Montjoye
44
3
0
24 May 2024
Token-wise Influential Training Data Retrieval for Large Language Models
Huawei Lin
Jikai Long
Zhaozhuo Xu
Weijie Zhao
54
3
0
20 May 2024
A Multi-Perspective Analysis of Memorization in Large Language Models
Bowen Chen
Namgi Han
Yusuke Miyao
46
1
0
19 May 2024
Learnable Privacy Neurons Localization in Language Models
Ruizhe Chen
Tianxiang Hu
Yang Feng
Zuo-Qiang Liu
41
12
0
16 May 2024
Many-Shot Regurgitation (MSR) Prompting
Shashank Sonkar
Richard G. Baraniuk
AAML
33
1
0
13 May 2024
Intellecta Cognitiva: A Comprehensive Dataset for Advancing Academic Knowledge and Machine Reasoning
PS Ajmal
PS Ditto
VG Jithin
16
0
0
13 Apr 2024
Rho-1: Not All Tokens Are What You Need
Zheng-Wen Lin
Zhibin Gou
Yeyun Gong
Xiao Liu
Yelong Shen
...
Chen Lin
Yujiu Yang
Jian Jiao
Nan Duan
Weizhu Chen
CLL
50
55
0
11 Apr 2024
Noise Masking Attacks and Defenses for Pretrained Speech Models
Matthew Jagielski
Om Thakkar
Lun Wang
AAML
37
4
0
02 Apr 2024
Towards Memorization-Free Diffusion Models
Chen Chen
Daochang Liu
Chang Xu
VLM
38
25
0
01 Apr 2024
Not All Similarities Are Created Equal: Leveraging Data-Driven Biases to Inform GenAI Copyright Disputes
Uri Y. Hacohen
Adi Haviv
Shahar Sarfaty
Bruria Friedman
N. Elkin-Koren
Roi Livni
Amit H. Bermano
AILaw
36
7
0
26 Mar 2024
Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models
Zhiyuan Yu
Xiaogeng Liu
Shunning Liang
Zach Cameron
Chaowei Xiao
Ning Zhang
28
40
0
26 Mar 2024
Concerned with Data Contamination? Assessing Countermeasures in Code Language Model
Jialun Cao
Wuqi Zhang
Shing-Chi Cheung
19
16
0
25 Mar 2024
Provable Privacy with Non-Private Pre-Processing
Yaxian Hu
Amartya Sanyal
Bernhard Schölkopf
24
2
0
19 Mar 2024
RAFT: Adapting Language Model to Domain Specific RAG
Tianjun Zhang
Shishir G. Patil
Naman Jain
Sheng Shen
Matei A. Zaharia
Ion Stoica
Joseph E. Gonzalez
RALM
32
179
0
15 Mar 2024
PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps
Ruixuan Liu
Tianhao Wang
Yang Cao
Li Xiong
AAML
SILM
53
15
0
14 Mar 2024
Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models
Kang Gu
Md. Rafi Ur Rashid
Najrin Sultana
Shagufta Mehnaz
MU
34
5
0
13 Mar 2024
Development of a Reliable and Accessible Caregiving Language Model (CaLM)
B. Parmanto
Bayu Aryoyudanta
Wilbert Soekinto
Agus Setiawan
Yuhan Wang
Haomin Hu
Andi Saptono
Yong K Choi
26
0
0
11 Mar 2024
On Protecting the Data Privacy of Large Language Models (LLMs): A Survey
Biwei Yan
Kun Li
Minghui Xu
Yueyan Dong
Yue Zhang
Zhaochun Ren
Xiuzhen Cheng
AILaw
PILM
70
76
0
08 Mar 2024
Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models
Martin Riddell
Ansong Ni
Arman Cohan
ELM
34
28
0
06 Mar 2024
Reliable, Adaptable, and Attributable Language Models with Retrieval
Akari Asai
Zexuan Zhong
Danqi Chen
Pang Wei Koh
Luke Zettlemoyer
Hanna Hajishirzi
Wen-tau Yih
KELM
RALM
46
53
0
05 Mar 2024
Previous
1
2
3
4
5
Next