v1v2 (latest)

Legal Experts Disagree With Rationale Extraction Techniques for Explaining ECtHR Case Outcome Classification

18 January 2026

Mahammad Namazov

Tomáš Koref

Ivan Habernal

FaML

AILaw

ELM

ArXiv (abs)PDF HTML Github

Main:8 Pages

11 Figures

Bibliography:4 Pages

11 Tables

Appendix:14 Pages

Abstract

Interpretability is critical for applications of large language models (LLMs) in the legal domain, where trust and transparency are essential. A central NLP task in this setting is legal outcome prediction, where models forecast whether a court will find a violation of a given right. We study this task on decisions from the European Court of Human Rights (ECtHR), introducing a new ECtHR dataset with carefully curated positive (violation) and negative (non-violation) cases. Existing works propose both task-specific approaches and model-agnostic techniques to explain downstream performance, but it remains unclear which techniques best explain legal outcome prediction. To address this, we propose a comparative analysis framework for model-agnostic interpretability methods. We focus on two rationale extraction techniques that justify model outputs with concise, human-interpretable text fragments from the input. We evaluate faithfulness via normalized sufficiency and comprehensiveness metrics, and plausibility via legal expert judgments of the extracted rationales. We also assess the feasibility of using LLM-as-a-Judge, using these expert evaluations as reference. Our experiments on the new ECtHR dataset show that models' "reasons" for predicting violations differ substantially from those of legal experts, despite strong faithfulness scores. The source code of our experiments is publicly available atthis https URL.

View on arXiv

Comments on this paper