CompBench: Benchmarking Complex Instruction-guided Image Editing

While real-world applications increasingly demand intricate scene manipulation, existing instruction-guided image editing benchmarks often oversimplify task complexity and lack comprehensive, fine-grained instructions. To bridge this gap, we introduce, a large-scale benchmark specifically designed for complex instruction-guided image editing. CompBench features challenging editing scenarios that incorporate fine-grained instruction following, spatial and contextual reasoning, thereby enabling comprehensive evaluation of image editing models' precise manipulation capabilities. To construct CompBench, We propose an MLLM-human collaborative framework with tailored task pipelines. Furthermore, we propose an instruction decoupling strategy that disentangles editing intents into four key dimensions: location, appearance, dynamics, and objects, ensuring closer alignment between instructions and complex editing requirements. Extensive evaluations reveal that CompBench exposes fundamental limitations of current image editing models and provides critical insights for the development of next-generation instruction-guided image editing systems. The dataset, code, and models are available inthis https URL.
View on arXiv@article{jia2025_2505.12200, title={ CompBench: Benchmarking Complex Instruction-guided Image Editing }, author={ Bohan Jia and Wenxuan Huang and Yuntian Tang and Junbo Qiao and Jincheng Liao and Shaosheng Cao and Fei Zhao and Zhaopeng Feng and Zhouhong Gu and Zhenfei Yin and Lei Bai and Wanli Ouyang and Lin Chen and Fei Zhao and Zihan Wang and Yuan Xie and Shaohui Lin }, journal={arXiv preprint arXiv:2505.12200}, year={ 2025 } }