ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.04528
16
0

A Consequentialist Critique of Binary Classification Evaluation Practices

6 April 2025
Gerardo Flores
Abigail Schiff
Alyssa H. Smith
Julia A Fukuyama
Ashia C. Wilson
ArXivPDFHTML
Abstract

ML-supported decisions, such as ordering tests or determining preventive custody, often involve binary classification based on probabilistic forecasts. Evaluation frameworks for such forecasts typically consider whether to prioritize independent-decision metrics (e.g., Accuracy) or top-K metrics (e.g., Precision@K), and whether to focus on fixed thresholds or threshold-agnostic measures like AUC-ROC. We highlight that a consequentialist perspective, long advocated by decision theorists, should naturally favor evaluations that support independent decisions using a mixture of thresholds given their prevalence, such as Brier scores and Log loss. However, our empirical analysis reveals a strong preference for top-K metrics or fixed thresholds in evaluations at major conferences like ICML, FAccT, and CHIL. To address this gap, we use this decision-theoretic framework to map evaluation metrics to their optimal use cases, along with a Python package, briertools, to promote the broader adoption of Brier scores. In doing so, we also uncover new theoretical connections, including a reconciliation between the Brier Score and Decision Curve Analysis, which clarifies and responds to a longstanding critique by (Assel, et al. 2017) regarding the clinical utility of proper scoring rules.

View on arXiv
@article{flores2025_2504.04528,
  title={ A Consequentialist Critique of Binary Classification Evaluation Practices },
  author={ Gerardo Flores and Abigail Schiff and Alyssa H. Smith and Julia A Fukuyama and Ashia C. Wilson },
  journal={arXiv preprint arXiv:2504.04528},
  year={ 2025 }
}
Comments on this paper