46
0

Are All Spanish Doctors Male? Evaluating Gender Bias in German Machine Translation

Abstract

We present WinoMTDE, a new gender bias evaluation test set designed to assess occupational stereotyping and underrepresentation in German machine translation (MT) systems. Building on the automatic evaluation method introduced byarXiv:1906.00591v1, we extend the approach to German, a language with grammatical gender. The WinoMTDE dataset comprises 288 German sentences that are balanced in regard to gender, as well as stereotype, which was annotated using German labor statistics. We conduct a large-scale evaluation of five widely used MT systems and a large language model. Our results reveal persistent bias in most models, with the LLM outperforming traditional systems. The dataset and evaluation code are publicly available underthis https URL.

View on arXiv
@article{kappl2025_2502.19104,
  title={ Are All Spanish Doctors Male? Evaluating Gender Bias in German Machine Translation },
  author={ Michelle Kappl },
  journal={arXiv preprint arXiv:2502.19104},
  year={ 2025 }
}
Comments on this paper