Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models

Estimating uncertainty in text-to-image diffusion models is challenging because of their large parameter counts (often exceeding 100 million) and operation in complex, high-dimensional spaces with virtually infinite input possibilities. In this paper, we propose Epistemic Mixture of Experts (EMoE), a novel framework for efficiently estimating epistemic uncertainty in diffusion models. EMoE leverages pre-trained networks without requiring additional training, enabling direct uncertainty estimation from a prompt. We leverage a latent space within the diffusion process that captures epistemic uncertainty better than existing methods. Experimental results on the COCO dataset demonstrate EMoE's effectiveness, showing a strong correlation between uncertainty and image quality. Additionally, EMoE identifies under-sampled languages and regions with higher uncertainty, revealing hidden biases in the training set. This capability demonstrates the relevance of EMoE as a tool for addressing fairness and accountability in AI-generated content.
View on arXiv@article{berry2025_2505.13273, title={ Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models }, author={ Lucas Berry and Axel Brando and Wei-Di Chang and Juan Camilo Gamboa Higuera and David Meger }, journal={arXiv preprint arXiv:2505.13273}, year={ 2025 } }