Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.01333
Cited By
Probing Language Models for Pre-training Data Detection
3 June 2024
Zhenhua Liu
Tong Zhu
Chuanyuan Tan
Haonan Lu
Bing Liu
Wenliang Chen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Probing Language Models for Pre-training Data Detection"
8 / 8 papers shown
Title
Revisiting Data Auditing in Large Vision-Language Models
Hongyu Zhu
Sichu Liang
Luu Anh Tuan
Boheng Li
Tongxin Yuan
Fangqi Li
Shilin Wang
Zhuosheng Zhang
VLM
209
0
0
25 Apr 2025
Language Models Can Predict Their Own Behavior
Dhananjay Ashok
Jonathan May
ReLM
AI4TS
LRM
63
0
0
18 Feb 2025
Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method
Weichao Zhang
Ruqing Zhang
Jiafeng Guo
Maarten de Rijke
Yixing Fan
Xueqi Cheng
38
8
0
23 Sep 2024
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs
Zhenhua Liu
Tong Zhu
Chuanyuan Tan
Wenliang Chen
PILM
MU
53
8
0
14 Jul 2024
Do LLMs Dream of Ontologies?
Marco Bombieri
Paolo Fiorini
Simone Paolo Ponzetto
M. Rospocher
CLL
32
2
0
26 Jan 2024
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
279
1,996
0
31 Dec 2020
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
290
1,824
0
14 Dec 2020
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
237
815
0
13 Sep 2019
1