ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.02648
296
0

MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation

5 May 2025
Mingcheng Li
Xiaolu Hou
Ziyang Liu
Dingkang Yang
Ziyun Qian
Jiawei Chen
Jinjie Wei
Yiheng Jiang
Qingyao Xu
Li Zhang
    DiffM
ArXivPDFHTML
Abstract

Diffusion models have shown excellent performance in text-to-image generation. Nevertheless, existing methods often suffer from performance bottlenecks when handling complex prompts that involve multiple objects, characteristics, and relations. Therefore, we propose a Multi-agent Collaboration-based Compositional Diffusion (MCCD) for text-to-image generation for complex scenes. Specifically, we design a multi-agent collaboration-based scene parsing module that generates an agent system comprising multiple agents with distinct tasks, utilizing MLLMs to extract various scene elements effectively. In addition, Hierarchical Compositional diffusion utilizes a Gaussian mask and filtering to refine bounding box regions and enhance objects through region enhancement, resulting in the accurate and high-fidelity generation of complex scenes. Comprehensive experiments demonstrate that our MCCD significantly improves the performance of the baseline models in a training-free manner, providing a substantial advantage in complex scene generation.

View on arXiv
@article{li2025_2505.02648,
  title={ MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation },
  author={ Mingcheng Li and Xiaolu Hou and Ziyang Liu and Dingkang Yang and Ziyun Qian and Jiawei Chen and Jinjie Wei and Yue Jiang and Qingyao Xu and Lihua Zhang },
  journal={arXiv preprint arXiv:2505.02648},
  year={ 2025 }
}
Comments on this paper