From Speech to Data: Unraveling Google's Use of Voice Data for User Profiling

3 March 2024

Abstract

Smart home voice assistants enable users to conveniently interact with IoT devices and perform Internet searches; however, they also collect the voice input that can carry sensitive personal information about users. Previous papers investigated how information inferred from the contents of users' voice commands are shared or leaked for tracking and advertising purposes. In this paper, we systematically evaluate how voice itself is used for user profiling in the Google ecosystem. To do so, we simulate various user personas by engaging with specific categories of websites. We then use \textit{neutral voice commands}, which we define as voice commands that neither reveal personal interests nor require Google smart speakers to use the search APIs, to interact with these speakers. We also explore the effects of the non-neutral voice commands for user profiling. Notably, we employ voices that typically would not match the predefined personas. We then iteratively improve our experiments based on observations of profile changes to better simulate real-world user interactions with smart speakers. We find that Google uses these voice recordings for user profiling, and in some cases, up to 5 out of the 8 categories reported by Google for customizing advertisements are altered following the collection of the voice commands.

View on arXiv

Comments on this paper