Closed-form cdf and pdf of Tukey's h-distribution: The Lambert Way to "Gaussianize'' skewed, heavy-tailed data

11 October 2010

Abstract

Recently Goerg (2010) introduced Lambert W $\times$ F random variables (RVs), a new family of generalized skewed distributions. Here I adapt this framework to generate heavy-tailed versions of arbitrary distributions. As in the skewed case a non-linear, parametric transformation of an input RV $X$ with arbitrary cumulative distribution function (cdf) $F_X(x)$ yields a heavy-tailed version $Y$ . The tail behavior depends on a tail parameter $\gamma \geq 0$ ; for $\gamma = 0$ , $Y = X$ , for $\gamma > 0$ $Y$ has heavier tails than $X$ . It turns out that heavy-tail Lambert W $\times$ Gaussian RVs equal heavy-tailed Tukey h RVs (the $g-h$ family with $g \rightarrow 0$ ). The Lambert W framework yields an explicit inverse of the $h$ transformation, and thus analytical, concise and simple expressions for the cdf and pdf for Tukey's $h$ distribution - to the authors knowledge the first time in the literature. Furthermore, the Lambert W approach gives applied researchers the tool to ``Gaussianize'' their skewed, heavy-tailed data and apply common methods and models on the so obtained Gaussian data. The optimal parameters to Gaussianize can be estimated by maximum likelihood (ML). %Contrary to the skewed case, the transformation is bijective: each observed data point is uniquely linked to its hidden (and normally tailed) input. A modular toolkit to analyze data using the proposed methods will soon be added to the \href{cran.r-project.org/web/packages/LambertW}{\texttt{LambertW}} $R$ package, originally implemented for the skew Lambert W case.

View on arXiv

Comments on this paper