ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.10904
24
9

GPU-based Private Information Retrieval for On-Device Machine Learning Inference

26 January 2023
Maximilian Lam
Jeff Johnson
Wenjie Xiong
Kiwan Maeng
Udit Gupta
Yang Li
Liangzhen Lai
Ilias Leontiadis
Minsoo Rhu
Hsien-Hsin S. Lee
Vijay Janapa Reddi
Gu-Yeon Wei
David Brooks
Edward Suh
ArXivPDFHTML
Abstract

On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. In particular, recommendation models typically use multiple embedding tables each on the order of 1-10 GBs of data, making them impractical to store on-device. To overcome this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information. As off-the-shelf PIR algorithms are usually too computationally intensive to directly use for latency-sensitive inference tasks, we 1) propose novel GPU-based acceleration of PIR, and 2) co-design PIR with the downstream ML application to obtain further speedup. Our GPU acceleration strategy improves system throughput by more than 20×20 \times20× over an optimized CPU PIR implementation, and our PIR-ML co-design provides an over 5×5 \times5× additional throughput improvement at fixed model quality. Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to 100,000100,000100,000 queries per second -- a >100×>100 \times>100× throughput improvement over a CPU-based baseline -- while maintaining model accuracy.

View on arXiv
Comments on this paper