Answering multi-hop questions involving new knowledge is a challenging task in evaluating large language models' (LLMs) knowledge editing (KE) methods. This task is rather difficult because the LLMs' hallucinations on new knowledge would harm the logical coherence of LLMs' multi-hop reasoning and lead to incorrect answers. To address this issue, we design decoding constraints to "regulate" LLMs' reasoning, enhancing logical coherence when incorporating new knowledge. We incorporate the constraints into a new KE framework: DEEPEDIT (Depth-first Search-based Constrained Decoding for Knowledge Editing), which enhances LLMs to generate coherent reasoning chains with new knowledge through a depth-first search. Our search selects the most important knowledge that satisfies our constraints as the reasoning step to efficiently increase the reasoning depth. In addition to DEEPEDIT, we propose two new KE benchmarks: MQUAKE-2002 and MQUAKE-HARD, which provide more precise and challenging assessments of KE approaches. Qualitatively, DEEPEDIT enables LLMs to produce succinct and coherent reasoning chains involving new knowledge. Quantitatively, it yields significant improvements on multiple KE benchmarks.
View on arXiv