ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.16157
  4. Cited By
Training Data Extraction From Pre-trained Language Models: A Survey

Training Data Extraction From Pre-trained Language Models: A Survey

25 May 2023
Shotaro Ishihara
ArXivPDFHTML

Papers citing "Training Data Extraction From Pre-trained Language Models: A Survey"

42 / 42 papers shown
Title
Towards Harnessing the Collaborative Power of Large and Small Models for Domain Tasks
Towards Harnessing the Collaborative Power of Large and Small Models for Domain Tasks
Yang Liu
Bingjie Yan
Tianyuan Zou
Jianqing Zhang
Zixuan Gu
...
Jiajian Li
Xiaozhou Ye
Ye Ouyang
Qiang Yang
Wenjie Qu
ALM
155
1
0
24 Apr 2025
Evidencing Unauthorized Training Data from AI Generated Content using Information Isotopes
Evidencing Unauthorized Training Data from AI Generated Content using Information Isotopes
Qi Tao
Yin Jinhua
Cai Dongqi
Xie Yueqi
Wang Huili
...
Zhou Zhili
Wang Shangguang
Lyu Lingjuan
Huang Yongfeng
Lane Nicholas
40
0
0
24 Mar 2025
Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models
Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models
Yu He
Boheng Li
L. Liu
Zhongjie Ba
Wei Dong
Yiming Li
Zhan Qin
Kui Ren
Cheng Chen
MIALM
74
0
0
26 Feb 2025
Integrating LLMs with ITS: Recent Advances, Potentials, Challenges, and Future Directions
Integrating LLMs with ITS: Recent Advances, Potentials, Challenges, and Future Directions
Doaa Mahmud
Hadeel Hajmohamed
Shamma Almentheri
Shamma Alqaydi
Lameya Aldhaheri
R. A. Khalil
Nasir Saeed
AI4TS
40
5
0
08 Jan 2025
Do LLMs Know to Respect Copyright Notice?
Do LLMs Know to Respect Copyright Notice?
Jialiang Xu
Shenglan Li
Zhaozhuo Xu
Denghui Zhang
37
2
0
02 Nov 2024
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Yujuan Fu
Özlem Uzuner
Meliha Yetisgen
Fei Xia
59
3
0
24 Oct 2024
CLEAR: Towards Contextual LLM-Empowered Privacy Policy Analysis and Risk
  Generation for Large Language Model Applications
CLEAR: Towards Contextual LLM-Empowered Privacy Policy Analysis and Risk Generation for Large Language Model Applications
Chaoran Chen
Daodao Zhou
Yanfang Ye
Toby Jia-jun Li
Yaxing Yao
AILaw
41
3
0
17 Oct 2024
A Theoretical Survey on Foundation Models
A Theoretical Survey on Foundation Models
Shi Fu
Yuzhu Chen
Yingjie Wang
Dacheng Tao
28
0
0
15 Oct 2024
Decoding Secret Memorization in Code LLMs Through Token-Level Characterization
Decoding Secret Memorization in Code LLMs Through Token-Level Characterization
Yuqing Nie
Chong Wang
Kaixin Wang
Guoai Xu
Guosheng Xu
Haoyu Wang
OffRL
136
1
0
11 Oct 2024
Undesirable Memorization in Large Language Models: A Survey
Undesirable Memorization in Large Language Models: A Survey
Ali Satvaty
Suzan Verberne
Fatih Turkmen
ELM
PILM
74
7
0
03 Oct 2024
The Emerged Security and Privacy of LLM Agent: A Survey with Case
  Studies
The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies
Feng He
Tianqing Zhu
Dayong Ye
Bo Liu
Wanlei Zhou
Philip S. Yu
PILM
LLMAG
ELM
68
24
0
28 Jul 2024
Replication in Visual Diffusion Models: A Survey and Outlook
Replication in Visual Diffusion Models: A Survey and Outlook
Wenhao Wang
Yifan Sun
Zongxin Yang
Zhengdong Hu
Zhentao Tan
Yi Yang
86
7
0
07 Jul 2024
PII-Compass: Guiding LLM training data extraction prompts towards the
  target PII via grounding
PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding
K. K. Nakka
Ahmed Frikha
Ricardo Mendes
Xue Jiang
Xuebing Zhou
32
7
0
03 Jul 2024
Causal Estimation of Memorisation Profiles
Causal Estimation of Memorisation Profiles
Pietro Lesci
Clara Meister
Thomas Hofmann
Andreas Vlachos
Tiago Pimentel
48
5
0
06 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
38
39
0
06 Jun 2024
Reconstructing training data from document understanding models
Reconstructing training data from document understanding models
Jérémie Dentan
Arnaud Paran
A. Shabou
AAML
SyDa
49
1
0
05 Jun 2024
Data Contamination Calibration for Black-box LLMs
Data Contamination Calibration for Black-box LLMs
Wen-song Ye
Jiaqi Hu
Liyao Li
Haobo Wang
Gang Chen
Junbo Zhao
40
6
0
20 May 2024
Privacy Preserving Prompt Engineering: A Survey
Privacy Preserving Prompt Engineering: A Survey
Kennedy Edemacu
Xintao Wu
47
18
0
09 Apr 2024
Pandora's White-Box: Precise Training Data Detection and Extraction in
  Large Language Models
Pandora's White-Box: Precise Training Data Detection and Extraction in Large Language Models
Jeffrey G. Wang
Jason Wang
Marvin Li
Seth Neel
MIALM
66
0
0
26 Feb 2024
LLMs with Industrial Lens: Deciphering the Challenges and Prospects -- A
  Survey
LLMs with Industrial Lens: Deciphering the Challenges and Prospects -- A Survey
Ashok Urlana
Charaka Vinayak Kumar
Ajeet Kumar Singh
B. Garlapati
S. Chalamala
Rahul Mishra
35
5
0
22 Feb 2024
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions
  Without the Question?
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?
Nishant Balepur
Abhilasha Ravichander
Rachel Rudinger
ELM
40
19
0
19 Feb 2024
Large Language Models in Cybersecurity: State-of-the-Art
Large Language Models in Cybersecurity: State-of-the-Art
Farzad Nourmohammadzadeh Motlagh
Mehrdad Hajizadeh
Mehryar Majd
Pejman Najafi
Feng Cheng
Christoph Meinel
ELM
46
43
0
30 Jan 2024
Do LLMs Dream of Ontologies?
Do LLMs Dream of Ontologies?
Marco Bombieri
Paolo Fiorini
Simone Paolo Ponzetto
M. Rospocher
CLL
29
2
0
26 Jan 2024
Traces of Memorisation in Large Language Models for Code
Traces of Memorisation in Large Language Models for Code
Ali Al-Kaswan
M. Izadi
A. van Deursen
ELM
36
14
0
18 Dec 2023
A Comprehensive Survey of Attack Techniques, Implementation, and
  Mitigation Strategies in Large Language Models
A Comprehensive Survey of Attack Techniques, Implementation, and Mitigation Strategies in Large Language Models
Aysan Esmradi
Daniel Wankit Yip
C. Chan
AAML
38
11
0
18 Dec 2023
Reducing Privacy Risks in Online Self-Disclosures with Language Models
Reducing Privacy Risks in Online Self-Disclosures with Language Models
Yao Dou
Isadora Krsek
Tarek Naous
Anubha Kabra
Sauvik Das
Alan Ritter
Wei-ping Xu
38
21
0
16 Nov 2023
Exploring the Numerical Reasoning Capabilities of Language Models: A
  Comprehensive Analysis on Tabular Data
Exploring the Numerical Reasoning Capabilities of Language Models: A Comprehensive Analysis on Tabular Data
Mubashara Akhtar
Abhilash Shankarampeta
Vivek Gupta
Arpit Patil
O. Cocarascu
Elena Simperl
LRM
ReLM
LMTD
ELM
39
21
0
03 Nov 2023
SoK: Memorization in General-Purpose Large Language Models
SoK: Memorization in General-Purpose Large Language Models
Valentin Hartmann
Anshuman Suri
Vincent Bindschaedler
David E. Evans
Shruti Tople
Robert West
KELM
LLMAG
21
20
0
24 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future
  Directions
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
48
42
0
16 Oct 2023
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending
  Against Extraction Attacks
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
Vaidehi Patil
Peter Hase
Joey Tianyi Zhou
KELM
AAML
25
96
0
29 Sep 2023
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and
  Vulnerabilities
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Maximilian Mozes
Xuanli He
Bennett Kleinberg
Lewis D. Griffin
39
78
0
24 Aug 2023
Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4
Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4
Kent K. Chang
Mackenzie Cramer
Sandeep Soni
David Bamman
RALM
145
111
0
28 Apr 2023
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang
Hongye Jin
Ruixiang Tang
Xiaotian Han
Qizhang Feng
Haoming Jiang
Bing Yin
Xia Hu
LM&MA
137
622
0
26 Apr 2023
Text Revealer: Private Text Reconstruction via Model Inversion Attacks
  against Transformers
Text Revealer: Private Text Reconstruction via Model Inversion Attacks against Transformers
Ruisi Zhang
Seira Hidano
F. Koushanfar
SILM
71
26
0
21 Sep 2022
Pile of Law: Learning Responsible Data Filtering from the Law and a
  256GB Open-Source Legal Dataset
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
Peter Henderson
M. Krass
Lucia Zheng
Neel Guha
Christopher D. Manning
Dan Jurafsky
Daniel E. Ho
AILaw
ELM
131
97
0
01 Jul 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
322
11,953
0
04 Mar 2022
Differentially Private Fine-tuning of Language Models
Differentially Private Fine-tuning of Language Models
Da Yu
Saurabh Naik
A. Backurs
Sivakanth Gopi
Huseyin A. Inan
...
Y. Lee
Andre Manoel
Lukas Wutschitz
Sergey Yekhanin
Huishuai Zhang
134
347
0
13 Oct 2021
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
242
593
0
14 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
279
1,996
0
31 Dec 2020
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
290
1,815
0
14 Dec 2020
Systematic Evaluation of Privacy Risks of Machine Learning Models
Systematic Evaluation of Privacy Risks of Machine Learning Models
Liwei Song
Prateek Mittal
MIACV
196
358
0
24 Mar 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
261
4,489
0
23 Jan 2020
1