Private Transformer Inference in MLaaS: A Survey

Transformer models have revolutionized AI, powering applications like content generation and sentiment analysis. However, their deployment in Machine Learning as a Service (MLaaS) raises significant privacy concerns, primarily due to the centralized processing of sensitive user data. Private Transformer Inference (PTI) offers a solution by utilizing cryptographic techniques such as secure multi-party computation and homomorphic encryption, enabling inference while preserving both user data and model privacy. This paper reviews recent PTI advancements, highlighting state-of-the-art solutions and challenges. We also introduce a structured taxonomy and evaluation framework for PTI, focusing on balancing resource efficiency with privacy and bridging the gap between high-performance inference and data privacy.
View on arXiv@article{li2025_2505.10315, title={ Private Transformer Inference in MLaaS: A Survey }, author={ Yang Li and Xinyu Zhou and Yitong Wang and Liangxin Qian and Jun Zhao }, journal={arXiv preprint arXiv:2505.10315}, year={ 2025 } }