ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.10892
  4. Cited By
Proving membership in LLM pretraining data via data watermarks

Proving membership in LLM pretraining data via data watermarks

16 February 2024
Johnny Tian-Zheng Wei
Ryan Yixiang Wang
Robin Jia
    WaLM
ArXivPDFHTML

Papers citing "Proving membership in LLM pretraining data via data watermarks"

10 / 10 papers shown
Title
The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text
The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text
Matthieu Meeus
Lukas Wutschitz
Santiago Zanella Béguelin
Shruti Tople
Reza Shokri
80
0
0
24 Feb 2025
Data Watermarking for Sequential Recommender Systems
Data Watermarking for Sequential Recommender Systems
Sixiao Zhang
Cheng Long
Wei Yuan
Hongxu Chen
Hongzhi Yin
2
0
0
20 Nov 2024
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Yujuan Fu
Özlem Uzuner
Meliha Yetisgen
Fei Xia
59
3
0
24 Oct 2024
Ward: Provable RAG Dataset Inference via LLM Watermarks
Ward: Provable RAG Dataset Inference via LLM Watermarks
Nikola Jovanović
Robin Staab
Maximilian Baader
Martin Vechev
145
1
0
04 Oct 2024
Large Language Models Memorize Sensor Datasets! Implications on Human
  Activity Recognition Research
Large Language Models Memorize Sensor Datasets! Implications on Human Activity Recognition Research
H. Haresamudram
Hrudhai Rajasekhar
Nikhil Murlidhar Shanbhogue
Thomas Ploetz
31
1
0
09 Jun 2024
The Mosaic Memory of Large Language Models
The Mosaic Memory of Large Language Models
Igor Shilov
Matthieu Meeus
Yves-Alexandre de Montjoye
47
3
0
24 May 2024
Data Portraits: Recording Foundation Model Training Data
Data Portraits: Recording Foundation Model Training Data
Marc Marone
Benjamin Van Durme
143
30
0
06 Mar 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
319
11,953
0
04 Mar 2022
Mitigating Dataset Harms Requires Stewardship: Lessons from 1000 Papers
Mitigating Dataset Harms Requires Stewardship: Lessons from 1000 Papers
Kenny Peng
Arunesh Mathur
Arvind Narayanan
99
93
0
06 Aug 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
276
1,996
0
31 Dec 2020
1