ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.03636
62
16
v1v2 (latest)

Testing Identity of Multidimensional Histograms

10 April 2018
Ilias Diakonikolas
D. Kane
John Peebles
ArXiv (abs)PDFHTML
Abstract

We investigate the problem of identity testing for multidimensional histogram distributions. A distribution p:D→R+p: D \rightarrow \mathbb{R}_+p:D→R+​, where D⊆RdD \subseteq \mathbb{R}^dD⊆Rd, is called a {kkk-histogram} if there exists a partition of the domain into kkk axis-aligned rectangles such that ppp is constant within each such rectangle. Histograms are one of the most fundamental non-parametric families of distributions and have been extensively studied in computer science and statistics. We give the first identity tester for this problem with {\em sub-learning} sample complexity in any fixed dimension and a nearly-matching sample complexity lower bound. More specifically, let qqq be an unknown ddd-dimensional kkk-histogram and ppp be an explicitly given kkk-histogram. We want to correctly distinguish, with probability at least 2/32/32/3, between the case that p=qp = qp=q versus ∥p−q∥1≥ϵ\|p-q\|_1 \geq \epsilon∥p−q∥1​≥ϵ. We design a computationally efficient algorithm for this hypothesis testing problem with sample complexity O((k/ϵ2)log⁡O(d)(k/ϵ))O((\sqrt{k}/\epsilon^2) \log^{O(d)}(k/\epsilon))O((k​/ϵ2)logO(d)(k/ϵ)). Our algorithm is robust to model misspecification, i.e., succeeds even if qqq is only promised to be {\em close} to a kkk-histogram. Moreover, for k=2Ω(d)k = 2^{\Omega(d)}k=2Ω(d), we show a nearly-matching sample complexity lower bound of Ω((k/ϵ2)(log⁡(k/ϵ)/d)Ω(d))\Omega((\sqrt{k}/\epsilon^2) (\log(k/\epsilon)/d)^{\Omega(d)})Ω((k​/ϵ2)(log(k/ϵ)/d)Ω(d)) when d≥2d\geq 2d≥2. Prior to our work, the sample complexity of the d=1d=1d=1 case was well-understood, but no algorithm with sub-learning sample complexity was known, even for d=2d=2d=2. Our new upper and lower bounds have interesting conceptual implications regarding the relation between learning and testing in this setting.

View on arXiv
Comments on this paper