ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.04146
36
73
v1v2 (latest)

A Random Sample Partition Data Model for Big Data Analysis

12 December 2017
Salman Salloum
Yulin He
J. Huang
Xiaoliang Zhang
Tamer Z. Emara
ArXiv (abs)PDFHTML
Abstract

Big data sets must be carefully partitioned into statistically similar data subsets that can be used as representative samples for big data analysis tasks. In this paper, we propose the random sample partition (RSP) to represent a big data set as a set of non-overlapping data subsets, i.e. RSP data blocks, where each RSP data block has the same probability distribution with the whole big data set. Then, the block-based sampling is used to directly select representative samples for a variety of data analysis tasks. We show how RSP data blocks can be employed to estimate statistics and build models which are equivalent (or approximate) to those from the whole big data set.

View on arXiv
Comments on this paper