ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.01704
24
1

Feature Importance Disparities for Data Bias Investigations

3 March 2023
Peter W. Chang
Leor Fishman
Seth Neel
ArXivPDFHTML
Abstract

It is widely held that one cause of downstream bias in classifiers is bias present in the training data. Rectifying such biases may involve context-dependent interventions such as training separate models on subgroups, removing features with bias in the collection process, or even conducting real-world experiments to ascertain sources of bias. Despite the need for such data bias investigations, few automated methods exist to assist practitioners in these efforts. In this paper, we present one such method that given a dataset XXX consisting of protected and unprotected features, outcomes yyy, and a regressor hhh that predicts yyy given XXX, outputs a tuple (fj,g)(f_j, g)(fj​,g), with the following property: ggg corresponds to a subset of the training dataset (X,y)(X, y)(X,y), such that the jthj^{th}jth feature fjf_jfj​ has much larger (or smaller) influence in the subgroup ggg, than on the dataset overall, which we call feature importance disparity (FID). We show across 444 datasets and 444 common feature importance methods of broad interest to the machine learning community that we can efficiently find subgroups with large FID values even over exponentially large subgroup classes and in practice these groups correspond to subgroups with potentially serious bias issues as measured by standard fairness metrics.

View on arXiv
Comments on this paper