ISAC: Training-Free Instance-to-Semantic Attention Control for Improving Multi-Instance Generation

27 May 2025

Abstract

Text-to-image diffusion models excel at generating single-instance scenes but struggle with multi-instance scenarios, often merging or omitting objects. Unlike previous training-free approaches that rely solely on semantic-level guidance without addressing instance individuation, our training-free method, Instance-to-Semantic Attention Control (ISAC), explicitly resolves incomplete instance formation and semantic entanglement through an instance-first modeling approach. This enables ISAC to effectively leverage a hierarchical, tree-structured prompt mechanism, disentangling multiple object instances and individually aligning them with their corresponding semantic labels. Without employing any external models, ISAC achieves up to 52% average multi-class accuracy and 83% average multi-instance accuracy by effectively forming disentangled instances. The code will be made available upon publication.

View on arXiv

@article{jo2025_2505.20935,
  title={ ISAC: Training-Free Instance-to-Semantic Attention Control for Improving Multi-Instance Generation },
  author={ Sanghyun Jo and Wooyeol Lee and Ziseok Lee and Kyungsu Kim },
  journal={arXiv preprint arXiv:2505.20935},
  year={ 2025 }
}

Comments on this paper