ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.16762
32
3

Addressing Discretization-Induced Bias in Demographic Prediction

27 May 2024
Evan Dong
Aaron Schein
Yixin Wang
Nikhil Garg
ArXivPDFHTML
Abstract

Racial and other demographic imputation is necessary for many applications, especially in auditing disparities and outreach targeting in political campaigns. The canonical approach is to construct continuous predictions -- e.g., based on name and geography -- and then to discretize\textit{discretize}discretize the predictions by selecting the most likely class (argmax). We study how this practice produces discretization bias\textit{discretization bias}discretization bias. In particular, we show that argmax labeling, as used by a prominent commercial voter file vendor to impute race/ethnicity, results in a substantial under-count of African-American voters, e.g., by 28.2% points in North Carolina. This bias can have substantial implications in downstream tasks that use such labels. We then introduce a joint optimization\textit{joint optimization}joint optimization approach -- and a tractable data-driven thresholding\textit{data-driven thresholding}data-driven thresholding heuristic -- that can eliminate this bias, with negligible individual-level accuracy loss. Finally, we theoretically analyze discretization bias, show that calibrated continuous models are insufficient to eliminate it, and that an approach such as ours is necessary. Broadly, we warn researchers and practitioners against discretizing continuous demographic predictions without considering downstream consequences.

View on arXiv
Comments on this paper