Features in predictive models are not exchangeable, yet common supervised models treat them as such. Here we study ridge regression when the analyst can partition the features into groups based on external side-information. For example, in high-throughput biology, features may represent gene expression, protein abundance or clinical data and so each feature group represents a distinct modality. The analyst's goal is to choose optimal regularization parameters -- one for each group. In this work, we study the impact of on the predictive risk of group-regularized ridge regression by deriving limiting risk formulae under a high-dimensional random effects model with as . Furthermore, we propose a data-driven method for choosing that attains the optimal asymptotic risk: The key idea is to interpret the residual noise variance , as a regularization parameter to be chosen through cross-validation. An empirical Bayes construction maps the one-dimensional parameter to the -dimensional vector of regularization parameters, i.e., . Beyond its theoretical optimality, the proposed method is practical and runs as fast as cross-validated ridge regression without feature groups ().
View on arXiv