Artificial intelligence (AI) has introduced numerous opportunities for human assistance and task automation in medicine. However, it suffers from poor generalization in the presence of shifts in the data distribution. In the context of AI-based computed tomography (CT) analysis, significant data distribution shifts can be caused by changes in scanner manufacturer, reconstruction technique or dose. AI harmonization techniques can address this problem by reducing distribution shifts caused by various acquisition settings. This paper presents an open-source benchmark dataset containing CT scans of an anthropomorphic phantom acquired with various scanners and settings, which purpose is to foster the development of AI harmonization techniques. Using a phantom allows fixing variations attributed to inter- and intra-patient variations. The dataset includes 1378 image series acquired with 13 scanners from 4 manufacturers across 8 institutions using a harmonized protocol as well as several acquisition doses. Additionally, we present a methodology, baseline results and open-source code to assess image- and feature-level stability and liver tissue classification, promoting the development of AI harmonization strategies.
View on arXiv@article{amirian2025_2507.01539, title={ A Multi-Centric Anthropomorphic 3D CT Phantom-Based Benchmark Dataset for Harmonization }, author={ Mohammadreza Amirian and Michael Bach and Oscar Jimenez-del-Toro and Christoph Aberle and Roger Schaer and Vincent Andrearczyk and Jean-Félix Maestrati and Maria Martin Asiain and Kyriakos Flouris and Markus Obmann and Clarisse Dromain and Benoît Dufour and Pierre-Alexandre Alois Poletti and Hendrik von Tengg-Kobligk and Rolf Hügli and Martin Kretzschmar and Hatem Alkadhi and Ender Konukoglu and Henning Müller and Bram Stieltjes and Adrien Depeursinge }, journal={arXiv preprint arXiv:2507.01539}, year={ 2025 } }