ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.17111
17
0

Are Bias Evaluation Methods Biased ?

20 June 2025
Lina Berrayana
Sean Rooney
Luis Garces-Erice
Ioana Giurgiu
    ELM
ArXiv (abs)PDFHTML
Main:7 Pages
4 Figures
Bibliography:3 Pages
Appendix:3 Pages
Abstract

The creation of benchmarks to evaluate the safety of Large Language Models is one of the key activities within the trusted AI community. These benchmarks allow models to be compared for different aspects of safety such as toxicity, bias, harmful behavior etc. Independent benchmarks adopt different approaches with distinct data sets and evaluation methods. We investigate how robust such benchmarks are by using different approaches to rank a set of representative models for bias and compare how similar are the overall rankings. We show that different but widely used bias evaluations methods result in disparate model rankings. We conclude with recommendations for the community in the usage of such benchmarks.

View on arXiv
@article{berrayana2025_2506.17111,
  title={ Are Bias Evaluation Methods Biased ? },
  author={ Lina Berrayana and Sean Rooney and Luis Garcés-Erice and Ioana Giurgiu },
  journal={arXiv preprint arXiv:2506.17111},
  year={ 2025 }
}
Comments on this paper