Privacy-Preserving Boosting with Random Linear Classifiers for Learning from User-Generated Data

22 February 2018

Abstract

User-generated data is crucial to predictive modeling in many applications. With a web/mobile/wearable interface, an online service provider (SP) can continuously record user-generated data and depend on various predictive models learned from the data to improve their services and revenue. SPs owning a large collection of user-generated data have raised privacy concerns. We present a privacy-preserving framework, SecureBoost, which allows users to submit encrypted or randomly masked data to SPs that want to learn predictive models. Our framework utilizes random linear classifiers (RLCs) as the base classifiers in the boosting framework to simplify the design of privacy-preserving protocols. A Cryptographic Service Provider (CSP) is used to assist an SP's processing, reducing the complexity of the protocol constructions while the leakage of information to the CSP is limited. We present two constructions of SecureBoost: HE+GC and SecSh+GC, using combinations of homomorphic encryption, garbled circuits, and random masking to achieve both security and efficiency. For a boosted model, the SP learns only the base models (i.e., RLCs) and the CSP learns only the weights of the base models. This separated parameter holding avoids any party from abusing the final model or conducting model-based attacks. We have conducted extensive experiments to understand the quality of the RLC-based boosting and the cost distribution of the constructions. Our results show that SecureBoost can efficiently learn high-quality boosting models from protected user-generated data.

View on arXiv

Comments on this paper