ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.23291
53
1
v1v2 (latest)

ScEdit: Script-based Assessment of Knowledge Editing

29 May 2025
Xinye Li
Zunwen Zheng
Qian Zhang
Dekai Zhuang
Jiabao Kang
Liyan Xu
Qingbin Liu
Xi Chen
Zhiying Tu
Dianhui Chu
Dianbo Sui
    KELM
ArXiv (abs)PDFHTML
Main:8 Pages
11 Figures
Bibliography:4 Pages
8 Tables
Appendix:9 Pages
Abstract

Knowledge Editing (KE) has gained increasing attention, yet current KE tasks remain relatively simple. Under current evaluation frameworks, many editing methods achieve exceptionally high scores, sometimes nearing perfection. However, few studies integrate KE into real-world application scenarios (e.g., recent interest in LLM-as-agent). To support our analysis, we introduce a novel script-based benchmark -- ScEdit (Script-based Knowledge Editing Benchmark) -- which encompasses both counterfactual and temporal edits. We integrate token-level and text-level evaluation methods, comprehensively analyzing existing KE techniques. The benchmark extends traditional fact-based ("What"-type question) evaluation to action-based ("How"-type question) evaluation. We observe that all KE methods exhibit a drop in performance on established metrics and face challenges on text-level metrics, indicating a challenging task. Our benchmark is available atthis https URL.

View on arXiv
@article{li2025_2505.23291,
  title={ ScEdit: Script-based Assessment of Knowledge Editing },
  author={ Xinye Li and Zunwen Zheng and Qian Zhang and Dekai Zhuang and Jiabao Kang and Liyan Xu and Qingbin Liu and Xi Chen and Zhiying Tu and Dianhui Chu and Dianbo Sui },
  journal={arXiv preprint arXiv:2505.23291},
  year={ 2025 }
}
Comments on this paper