481

Transfer Learning with Multi-source Data: High-dimensional Inference for Group Distributionally Robust Models

Journal of the American Statistical Association (JASA), 2020
Abstract

The construction of generalizable and transferable models is a fundamental goal of statistical learning. Learning with the multi-source data helps improve model generalizability and is integral to many important statistical problems, including group distributionally robust optimization, minimax group fairness, and maximin projection. This paper considers multiple high-dimensional regression models for the multi-source data. We introduce the covariate shift maximin effect as a group distributionally robust model. This robust model helps transfer the information from the multi-source data to the unlabelled target population. Statistical inference for the covariate shift maximin effect is challenging since its point estimator may have a non-standard limiting distribution. We devise a novel {\it DenseNet} sampling method to construct valid confidence intervals for the high-dimensional maximin effect. We show that our proposed confidence interval achieves the desired coverage level and attains a parametric length. Our proposed DenseNet sampling method and the related theoretical analysis are of independent interest in addressing other non-regular or non-standard inference problems. We demonstrate the proposed method over a large-scale simulation and genetic data on yeast colony growth under multiple environments.

View on arXiv
Comments on this paper