Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.06539
Cited By
Deduplicating Training Data Mitigates Privacy Risks in Language Models
14 February 2022
Nikhil Kandpal
Eric Wallace
Colin Raffel
PILM
MU
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deduplicating Training Data Mitigates Privacy Risks in Language Models"
12 / 212 papers shown
Title
Mix and Match: Learning-free Controllable Text Generation using Energy Language Models
Fatemehsadat Mireshghallah
Kartik Goyal
Taylor Berg-Kirkpatrick
36
78
0
24 Mar 2022
Do Language Models Plagiarize?
Jooyoung Lee
Thai Le
Jinghui Chen
Dongwon Lee
36
74
0
15 Mar 2022
Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks
Fatemehsadat Mireshghallah
Kartik Goyal
Archit Uniyal
Taylor Berg-Kirkpatrick
Reza Shokri
MIALM
32
151
0
08 Mar 2022
Quantifying Memorization Across Neural Language Models
Nicholas Carlini
Daphne Ippolito
Matthew Jagielski
Katherine Lee
Florian Tramèr
Chiyuan Zhang
PILM
28
579
0
15 Feb 2022
Counterfactual Memorization in Neural Language Models
Chiyuan Zhang
Daphne Ippolito
Katherine Lee
Matthew Jagielski
Florian Tramèr
Nicholas Carlini
29
128
0
24 Dec 2021
How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN
R. Thomas McCoy
P. Smolensky
Tal Linzen
Jianfeng Gao
Asli Celikyilmaz
SyDa
25
119
0
18 Nov 2021
Differentially Private Fine-tuning of Language Models
Da Yu
Saurabh Naik
A. Backurs
Sivakanth Gopi
Huseyin A. Inan
...
Y. Lee
Andre Manoel
Lukas Wutschitz
Sergey Yekhanin
Huishuai Zhang
134
347
0
13 Oct 2021
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
242
593
0
14 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
256
1,996
0
31 Dec 2020
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
290
1,815
0
14 Dec 2020
When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning?
Gavin Brown
Mark Bun
Vitaly Feldman
Adam D. Smith
Kunal Talwar
253
93
0
11 Dec 2020
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
415
2,588
0
03 Sep 2019
Previous
1
2
3
4
5