Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Preference Understanding

25 April 2025

Abstract

Generative AI has significantly changed industries by enabling text-driven image generation, yet challenges remain in achieving high-resolution outputs that align with fine-grained user preferences. Consequently, multi-round interactions are necessary to ensure the generated images meet expectations. Previous methods enhanced prompts via reward feedback but did not optimize over a multi-round dialogue dataset. In this work, we present a Visual Co-Adaptation (VCA) framework incorporating human-in-the-loop feedback, leveraging a well-trained reward model aligned with human preferences. Using a diverse multi-turn dialogue dataset, our framework applies multiple reward functions, such as diversity, consistency, and preference feedback, while fine-tuning the diffusion model through LoRA, thus optimizing image generation based on user input. We also construct multi-round dialogue datasets of prompts and image pairs aligned with user intent. Experiments demonstrate that our method outperforms state-of-the-art baselines, significantly improving image consistency and alignment with user intent. Our approach consistently surpasses competing models in user satisfaction, especially in multi-turn dialogue scenarios.

View on arXiv

@article{li2025_2504.18204,
  title={ Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Preference Understanding },
  author={ Kun Li and Jianhui Wang and Yangfan He and Xinyuan Song and Ruoyu Wang and Hongyang He and Wenxin Zhang and Jiaqi Chen and Keqin Li and Sida Li and Miao Zhang and Tianyu Shi and Xueqian Wang },
  journal={arXiv preprint arXiv:2504.18204},
  year={ 2025 }
}

Comments on this paper