v1v2 (latest)

On a phase transition in general order spline regression

23 April 2020

Abstract

In the Gaussian sequence model $Y= \theta_0 + \varepsilon$ in $\mathbb{R}^n$ , we study the fundamental limit of approximating the signal $\theta_0$ by a class $\Theta(d,d_0,k)$ of (generalized) splines with free knots. Here $d$ is the degree of the spline, $d_0$ is the order of differentiability at each inner knot, and $k$ is the maximal number of pieces. We show that, given any integer $d\geq 0$ and $d_0\in\{-1,0,\ldots,d-1\}$ , the minimax rate of estimation over $\Theta(d,d_0,k)$ exhibits the following phase transition: \begin{equation*} \begin{aligned} \inf_{\widetilde{\theta}}\sup_{\theta\in\Theta(d,d_0, k)}\mathbb{E}_\theta\|\widetilde{\theta} - \theta\|^2 \asymp_d \begin{cases} k\log\log(16n/k), & 2\leq k\leq k_0,\\ k\log(en/k), & k \geq k_0+1. \end{cases} \end{aligned} \end{equation*} The transition boundary $k_0$ , which takes the form $\lfloor{(d+1)/(d-d_0)\rfloor} + 1$ , demonstrates the critical role of the regularity parameter $d_0$ in the separation between a faster $\log \log(16n)$ and a slower $\log(en)$ rate. We further show that, once encouraging an additional ' $d$ -monotonicity' shape constraint (including monotonicity for $d = 0$ and convexity for $d=1$ ), the above phase transition is eliminated and the faster $k\log\log(16n/k)$ rate can be achieved for all $k$ . These results provide theoretical support for developing $\ell_0$ -penalized (shape-constrained) spline regression procedures as useful alternatives to $\ell_1$ - and $\ell_2$ -penalized ones.

View on arXiv

Comments on this paper