DualSchool: How Reliable are LLMs for Optimization Education?

27 May 2025

Main:8 Pages

7 Figures

5 Tables

Appendix:10 Pages

Abstract

Consider the following task taught in introductory optimization courses which addresses challenges articulated by the community at the intersection of (generative) AI and OR: generate the dual of a linear program. LLMs, being trained at web-scale, have the conversion process and many instances of Primal to Dual Conversion (P2DC) at their disposal. Students may thus reasonably expect that LLMs would perform well on the P2DC task. To assess this expectation, this paper introduces DualSchool, a comprehensive framework for generating and verifying P2DC instances. The verification procedure of DualSchool uses the Canonical Graph Edit Distance, going well beyond existing evaluation methods for optimization models, which exhibit many false positives and negatives when applied to P2DC. Experiments performed by DualSchool reveal interesting findings. Although LLMs can recite the conversion procedure accurately, state-of-the-art open LLMs fail to consistently produce correct duals. This finding holds even for the smallest two-variable instances and for derivative tasks, such as correctness, verification, and error classification. The paper also discusses the implications for educators, students, and the development of large reasoning systems.

View on arXiv

@article{klamkin2025_2505.21775,
  title={ DualSchool: How Reliable are LLMs for Optimization Education? },
  author={ Michael Klamkin and Arnaud Deza and Sikai Cheng and Haoruo Zhao and Pascal Van Hentenryck },
  journal={arXiv preprint arXiv:2505.21775},
  year={ 2025 }
}

Comments on this paper