ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.11079
108
37
v1v2v3v4v5v6v7v8v9 (latest)

Data fission: splitting a single data point

21 December 2021
James Leiner
Boyan Duan
Larry A. Wasserman
Aaditya Ramdas
ArXiv (abs)PDFHTML
Abstract

Suppose we observe a random vector XXX from some distribution PPP in a known family with unknown parameters. We ask the following question: when is it possible to split XXX into two parts f(X)f(X)f(X) and g(X)g(X)g(X) such that neither part is sufficient to reconstruct XXX by itself, but both together can recover XXX fully, and the joint distribution of (f(X),g(X))(f(X),g(X))(f(X),g(X)) is tractable? As one example, if X=(X1,…,Xn)X=(X_1,\dots,X_n)X=(X1​,…,Xn​) and PPP is a product distribution, then for any m<nm<nm<n, we can split the sample to define f(X)=(X1,…,Xm)f(X)=(X_1,\dots,X_m)f(X)=(X1​,…,Xm​) and g(X)=(Xm+1,…,Xn)g(X)=(X_{m+1},\dots,X_n)g(X)=(Xm+1​,…,Xn​). Rasines and Young (2021) offers an alternative route of accomplishing this task through randomization of XXX with additive Gaussian noise which enables post-selection inference in finite samples for Gaussian distributed data and asymptotically for non-Gaussian additive models. In this paper, we offer a more general methodology for achieving such a split in finite samples by borrowing ideas from Bayesian inference to yield a (frequentist) solution that can be viewed as a continuous analog of data splitting. We call our method data fission, as an alternative to data splitting, data carving and p-value masking. We exemplify the method on a few prototypical applications, such as post-selection inference for trend filtering and other regression problems.

View on arXiv
Comments on this paper