KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

22 May 2025

Main:9 Pages

37 Figures

Bibliography:6 Pages

3 Tables

Appendix:24 Pages

Abstract

Recent advances in multi-modal generative models have enabled significant progress in instruction-based image editing. However, while these models produce visually plausible outputs, their capacity for knowledge-based reasoning editing tasks remains under-explored. In this paper, we introduce KRIS-Bench (Knowledge-based Reasoning in Image-editing Systems Benchmark), a diagnostic benchmark designed to assess models through a cognitively informed lens. Drawing from educational theory, KRIS-Bench categorizes editing tasks across three foundational knowledge types: Factual, Conceptual, and Procedural. Based on this taxonomy, we design 22 representative tasks spanning 7 reasoning dimensions and release 1,267 high-quality annotated editing instances. To support fine-grained evaluation, we propose a comprehensive protocol that incorporates a novel Knowledge Plausibility metric, enhanced by knowledge hints and calibrated through human studies. Empirical results on 10 state-of-the-art models reveal significant gaps in reasoning performance, highlighting the need for knowledge-centric benchmarks to advance the development of intelligent image editing systems.

View on arXiv

@article{wu2025_2505.16707,
  title={ KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models },
  author={ Yongliang Wu and Zonghui Li and Xinting Hu and Xinyu Ye and Xianfang Zeng and Gang Yu and Wenbo Zhu and Bernt Schiele and Ming-Hsuan Yang and Xu Yang },
  journal={arXiv preprint arXiv:2505.16707},
  year={ 2025 }
}

Comments on this paper