58
6

Closed-form cdf and pdf of Tukey's h-distribution: The Lambert Way to "Gaussianize'' skewed, heavy-tailed data

Abstract

Recently Goerg (2010) introduced Lambert W ×\times F random variables (RVs), a new family of generalized skewed distributions. Here I adapt this framework to generate heavy-tailed versions of arbitrary distributions. As in the skewed case a non-linear, parametric transformation of an input RV XX with arbitrary cumulative distribution function (cdf) FX(x)F_X(x) yields a heavy-tailed version YY. The tail behavior depends on a tail parameter γ0\gamma \geq 0 ; for γ=0\gamma = 0, Y=XY = X, for γ>0\gamma > 0 YY has heavier tails than XX. It turns out that heavy-tail Lambert W ×\times Gaussian RVs equal heavy-tailed Tukey h RVs (the ghg-h family with g0g \rightarrow 0). The Lambert W framework yields an explicit inverse of the hh transformation, and thus analytical, concise and simple expressions for the cdf and pdf for Tukey's hh distribution - to the authors knowledge the first time in the literature. Furthermore, the Lambert W approach gives applied researchers the tool to ``Gaussianize'' their skewed, heavy-tailed data and apply common methods and models on the so obtained Gaussian data. The optimal parameters to Gaussianize can be estimated by maximum likelihood (ML). %Contrary to the skewed case, the transformation is bijective: each observed data point is uniquely linked to its hidden (and normally tailed) input. A modular toolkit to analyze data using the proposed methods will soon be added to the \href{cran.r-project.org/web/packages/LambertW}{\texttt{LambertW}} RR package, originally implemented for the skew Lambert W case.

View on arXiv
Comments on this paper