86
0

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

Main:9 Pages
37 Figures
Bibliography:6 Pages
3 Tables
Appendix:24 Pages
Abstract

Recent advances in multi-modal generative models have enabled significant progress in instruction-based image editing. However, while these models produce visually plausible outputs, their capacity for knowledge-based reasoning editing tasks remains under-explored. In this paper, we introduce KRIS-Bench (Knowledge-based Reasoning in Image-editing Systems Benchmark), a diagnostic benchmark designed to assess models through a cognitively informed lens. Drawing from educational theory, KRIS-Bench categorizes editing tasks across three foundational knowledge types: Factual, Conceptual, and Procedural. Based on this taxonomy, we design 22 representative tasks spanning 7 reasoning dimensions and release 1,267 high-quality annotated editing instances. To support fine-grained evaluation, we propose a comprehensive protocol that incorporates a novel Knowledge Plausibility metric, enhanced by knowledge hints and calibrated through human studies. Empirical results on 10 state-of-the-art models reveal significant gaps in reasoning performance, highlighting the need for knowledge-centric benchmarks to advance the development of intelligent image editing systems.

View on arXiv
@article{wu2025_2505.16707,
  title={ KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models },
  author={ Yongliang Wu and Zonghui Li and Xinting Hu and Xinyu Ye and Xianfang Zeng and Gang Yu and Wenbo Zhu and Bernt Schiele and Ming-Hsuan Yang and Xu Yang },
  journal={arXiv preprint arXiv:2505.16707},
  year={ 2025 }
}
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.