ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.16679
12
0

How to Train your Text-to-Image Model: Evaluating Design Choices for Synthetic Training Captions

20 June 2025
Manuel Brack
Sudeep Katakol
Felix Friedrich
P. Schramowski
Hareesh Ravi
Kristian Kersting
Ajinkya Kale
ArXiv (abs)PDFHTML
Main:8 Pages
16 Figures
Bibliography:3 Pages
3 Tables
Appendix:7 Pages
Abstract

Training data is at the core of any successful text-to-image models. The quality and descriptiveness of image text are crucial to a model's performance. Given the noisiness and inconsistency in web-scraped datasets, recent works shifted towards synthetic training captions. While this setup is generally believed to produce more capable models, current literature does not provide any insights into its design choices. This study closes this gap by systematically investigating how different synthetic captioning strategies impact the downstream performance of text-to-image models. Our experiments demonstrate that dense, high-quality captions enhance text alignment but may introduce trade-offs in output aesthetics and diversity. Conversely, captions of randomized lengths yield balanced improvements across aesthetics and alignment without compromising sample diversity. We also demonstrate that varying caption distributions introduce significant shifts in the output bias of a trained model. Our findings underscore the importance of caption design in achieving optimal model performance and provide practical insights for more effective training data strategies in text-to-image generation.

View on arXiv
@article{brack2025_2506.16679,
  title={ How to Train your Text-to-Image Model: Evaluating Design Choices for Synthetic Training Captions },
  author={ Manuel Brack and Sudeep Katakol and Felix Friedrich and Patrick Schramowski and Hareesh Ravi and Kristian Kersting and Ajinkya Kale },
  journal={arXiv preprint arXiv:2506.16679},
  year={ 2025 }
}
Comments on this paper