Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond

20 June 2025

Main:9 Pages

9 Figures

Bibliography:2 Pages

9 Tables

Appendix:11 Pages

Abstract

Accurately assessing student knowledge is critical for effective education, yet traditional Knowledge Tracing (KT) methods rely on opaque latent embeddings, limiting interpretability. Even LLM-based approaches generate direct predictions or summaries that may hallucinate without any accuracy guarantees. We recast KT as an inverse problem: learning the minimum natural-language summary that makes past answers explainable and future answers predictable. Our Language Bottleneck Model (LBM) consists of an encoder LLM that writes an interpretable knowledge summary and a frozen decoder LLM that must reconstruct and predict student responses using only that summary text. By constraining all predictive information to pass through a short natural-language bottleneck, LBMs ensure that the summary contains accurate information while remaining human-interpretable. Experiments on synthetic arithmetic benchmarks and the large-scale Eedi dataset show that LBMs rival the accuracy of state-of-the-art KT and direct LLM methods while requiring orders-of-magnitude fewer student trajectories. We demonstrate that training the encoder with group-relative policy optimization, using downstream decoding accuracy as a reward signal, effectively improves summary quality.

View on arXiv

@article{berthon2025_2506.16982,
  title={ Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond },
  author={ Antonin Berthon and Mihaela van der Schaar },
  journal={arXiv preprint arXiv:2506.16982},
  year={ 2025 }
}

Comments on this paper