DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models

5 June 2025

Revant Teotia

Candace Ross

Karen Ullrich

S. Chopra

Adriana Romero-Soriano

Abstract

Recent advances in text-to-image (T2I) models have achieved impressive quality and consistency. However, this has come at the cost of representation diversity. While automatic evaluation methods exist for benchmarking model diversity, they either require reference image datasets or lack specificity about the kind of diversity measured, limiting their adaptability and interpretability. To address this gap, we introduce the Does-it/Can-it framework, DIM-CIM, a reference-free measurement of default-mode diversity ("Does" the model generate images with expected attributes?) and generalization capacity ("Can" the model generate diverse attributes for a particular concept?). We construct the COCO-DIMCIM benchmark, which is seeded with COCO concepts and captions and augmented by a large language model. With COCO-DIMCIM, we find that widely-used models improve in generalization at the cost of default-mode diversity when scaling from 1.5B to 8.1B parameters. DIMCIM also identifies fine-grained failure cases, such as attributes that are generated with generic prompts but are rarely generated when explicitly requested. Finally, we use DIMCIM to evaluate the training data of a T2I model and observe a correlation of 0.85 between diversity in training images and default-mode diversity. Our work provides a flexible and interpretable framework for assessing T2I model diversity and generalization, enabling a more comprehensive understanding of model performance.

View on arXiv

@article{teotia2025_2506.05108,
  title={ DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models },
  author={ Revant Teotia and Candace Ross and Karen Ullrich and Sumit Chopra and Adriana Romero-Soriano and Melissa Hall and Matthew J. Muckley },
  journal={arXiv preprint arXiv:2506.05108},
  year={ 2025 }
}

Main:9 Pages

11 Figures

Bibliography:3 Pages

1 Tables

Appendix:6 Pages

Comments on this paper