ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.17675
55
2

Towards Transformer-Based Aligned Generation with Self-Coherence Guidance

22 March 2025
Shulei Wang
Wang Lin
Hai Huang
Hanting Wang
Sihang Cai
WenKang Han
Tao Jin
Jingyuan Chen
Jiacheng Sun
Jieming Zhu
Zhou Zhao
    DiffM
ArXivPDFHTML
Abstract

We introduce a novel, training-free approach for enhancing alignment in Transformer-based Text-Guided Diffusion Models (TGDMs). Existing TGDMs often struggle to generate semantically aligned images, particularly when dealing with complex text prompts or multi-concept attribute binding challenges. Previous U-Net-based methods primarily optimized the latent space, but their direct application to Transformer-based architectures has shown limited effectiveness. Our method addresses these challenges by directly optimizing cross-attention maps during the generation process. Specifically, we introduce Self-Coherence Guidance, a method that dynamically refines attention maps using masks derived from previous denoising steps, ensuring precise alignment without additional training. To validate our approach, we constructed more challenging benchmarks for evaluating coarse-grained attribute binding, fine-grained attribute binding, and style binding. Experimental results demonstrate the superior performance of our method, significantly surpassing other state-of-the-art methods across all evaluated tasks. Our code is available atthis https URL.

View on arXiv
@article{wang2025_2503.17675,
  title={ Towards Transformer-Based Aligned Generation with Self-Coherence Guidance },
  author={ Shulei Wang and Wang Lin and Hai Huang and Hanting Wang and Sihang Cai and WenKang Han and Tao Jin and Jingyuan Chen and Jiacheng Sun and Jieming Zhu and Zhou Zhao },
  journal={arXiv preprint arXiv:2503.17675},
  year={ 2025 }
}
Comments on this paper