Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.12238
Cited By
PANORAMA: A synthetic PII-laced dataset for studying sensitive data memorization in LLMs
18 May 2025
Sriram Selvam
Anneswa Ghosh
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"PANORAMA: A synthetic PII-laced dataset for studying sensitive data memorization in LLMs"
11 / 11 papers shown
Title
RedPajama: an Open Dataset for Training Large Language Models
Maurice Weber
Daniel Y. Fu
Quentin Anthony
Yonatan Oren
S. Adams
...
Tri Dao
Percy Liang
Christopher Ré
Irina Rish
Ce Zhang
222
83
0
19 Nov 2024
LLM-PBE: Assessing Data Privacy in Large Language Models
Qinbin Li
Junyuan Hong
Chulin Xie
Jeffrey Tan
Rachel Xin
...
Dan Hendrycks
Zhangyang Wang
Bo Li
Bingsheng He
Dawn Song
ELM
PILM
88
18
0
23 Aug 2024
Scalable Extraction of Training Data from (Production) Language Models
Milad Nasr
Nicholas Carlini
Jonathan Hayase
Matthew Jagielski
A. Feder Cooper
Daphne Ippolito
Christopher A. Choquette-Choo
Eric Wallace
Florian Tramèr
Katherine Lee
SILM
66
355
0
28 Nov 2023
The Privacy Onion Effect: Memorization is Relative
Nicholas Carlini
Matthew Jagielski
Chiyuan Zhang
Nicolas Papernot
Andreas Terzis
Florian Tramèr
PILM
MIACV
116
110
0
21 Jun 2022
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Kushal Tirumala
Aram H. Markosyan
Luke Zettlemoyer
Armen Aghajanyan
TDI
110
196
0
22 May 2022
Quantifying Memorization Across Neural Language Models
Nicholas Carlini
Daphne Ippolito
Matthew Jagielski
Katherine Lee
Florian Tramèr
Chiyuan Zhang
PILM
124
631
0
15 Feb 2022
Membership Inference Attacks From First Principles
Nicholas Carlini
Steve Chien
Milad Nasr
Shuang Song
Andreas Terzis
Florian Tramèr
MIACV
MIALM
87
706
0
07 Dec 2021
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
360
634
0
14 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
475
2,120
0
31 Dec 2020
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
Basel Alomair
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
509
1,946
0
14 Dec 2020
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
338
2,898
0
26 Sep 2016
1