Tuning Fairness by Marginalizing Latent Target Labels

Addressing fairness in machine learning models has recently attracted a lot of attention, as it will ensure continued confidence of the general public in the deployment of machine learning systems. Here, we focus on mitigating harm of a biased system that offers better outputs (e.g. loans, jobs) for certain groups than for others. We show that bias in the output can naturally be handled in probabilistic models by introducing a latent target output that will modulate the likelihood function. This simple formulation has several advantages: first, it is a unified framework for several notions of fairness such as demographic parity and equalized odds; second, it is expressed as marginalization instead of constrained problems; and third, it allows encoding our knowledge of what the bias in outputs should be. Practically, the latter translates to the ability to control the level of fairness by varying directly fairness target rates. In contrast, existing approaches rely on intermediate, arguably unintuitive control parameters such as a covariance threshold.
View on arXiv