Diffusion models have transformed image generation, yet controlling their outputs for diverse applications, including content moderation and creative customization, remains challenging. Existing approaches usually require task-specific training and struggle to generalize across both concrete (e.g., objects) and abstract (e.g., styles) concepts. We propose CASteer (Cross-Attention Steering) a training-free framework for controllable image generation using steering vectors to influence a diffusion models hidden representations dynamically. CASteer computes these vectors offline by averaging activations from concept-specific generated images, then applies them during inference via a dynamic heuristic that activates modifications only when necessary, removing concepts from affected images or adding them to unaffected ones. This approach enables precise control over a wide range of tasks, including removing harmful content, adding desired attributes, replacing objects, or altering styles, all without model retraining. CASteer handles both concrete and abstract concepts, outperforming state-of-the-art techniques across multiple diffusion models while preserving unrelated content and minimizing unintended effects.
View on arXiv@article{gaintseva2025_2503.09630, title={ CASteer: Steering Diffusion Models for Controllable Generation }, author={ Tatiana Gaintseva and Chengcheng Ma and Ziquan Liu and Martin Benning and Gregory Slabaugh and Jiankang Deng and Ismail Elezi }, journal={arXiv preprint arXiv:2503.09630}, year={ 2025 } }