The Lambert Way to Gaussianize skewed, heavy tailed data with the inverse of Tukey's h transformation as a special case

11 October 2010

Abstract

In this work I follow the same principle as in Goerg (2011) and introduce a parametric, bijective transformation to generate heavy-tail versions $Y$ of an arbitrary random variable (RV) $X \sim F_X$ . The tail behavior of the heavy-tail Lambert W $\times$ $F_X$ RV $Y$ depends on a tail parameter $\delta \geq 0$ ; for $\delta = 0$ , $Y = X$ , for $\delta > 0$ $Y$ has heavier tails than $X$ . For $X$ being Gaussian, this new meta-famliy of heavy-tailed distributions reduces to Tukey's $h$ distribution. The Lambert W framework yields an explicit inverse and thus analytical, concise and simple expressions for the cumulative distribution (cdf) $G_Y(y)$ and probability density function (pdf) $g_Y(y)$ , which are functions of $F_X(x)$ and $f_X(x)$ and Lambert's W function. As a special case, Tukey's $h$ pdf and cdf become available - to the authors knowledge for the first time in the literature. Furthermore, the Lambert W approach allows researchers to "Gaussianize" skewed, heavy-tailed data and apply common methods and models on the so obtained Gaussian data. The optimal parameters to Gaussianize can be estimated by maximum likelihood (ML). An illustration on a simulated Cauchy sample as well as S&P 500 log-returns demonstrate the power of this new family of heavy-tailed distributions: in both cases the back-transformed data is indistinguishable from a Gaussian sample. The R package "LambertW" (cran.r-project.org/web/packages/LambertW) contains the methods presented here to perform an adequate empirical analysis and is publicly available from CRAN

View on arXiv

Comments on this paper