94
0

Assessing Code Understanding in LLMs

Abstract

We present an empirical evaluation of Large Language Models in code understanding associated with non-trivial, semantic-preserving program transformations such as copy propagation or constant folding. Our findings show that LLMs fail to judge semantic equivalence in approximately 41\% of cases when no context is provided and in 29\% when given a simple generic context. To improve accuracy, we advocate integrating LLMs with code-optimization tools to enhance training and facilitate more robust program understanding.

View on arXiv
@article{laneve2025_2504.00065,
  title={ Assessing Code Understanding in LLMs },
  author={ Cosimo Laneve and Alvise Spanò and Dalila Ressi and Sabina Rossi and Michele Bugliesi },
  journal={arXiv preprint arXiv:2504.00065},
  year={ 2025 }
}
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.