How Does LLM Reasoning Work for Code? A Survey and a Call to Action

16 June 2025

Main:10 Pages

1 Figures

Bibliography:6 Pages

7 Tables

Appendix:3 Pages

Abstract

The rise of large language models (LLMs) has led to dramatic improvements across a wide range of natural language tasks. These advancements have extended into the domain of code, facilitating complex tasks such as code generation, translation, summarization, and repair. However, their utility for real-world deployment in-the-wild has only recently been studied, particularly on software engineering (SWE) tasks such as GitHub issue resolution. In this study, we examine the code reasoning techniques that underlie the ability to perform such tasks, and examine the paradigms used to drive their performance. Our contributions in this paper are: (1) the first dedicated survey on code reasoning for code tasks, highlighting overarching strategies, hybrid and agentic approaches; (2) a taxonomy of various techniques used to drive code reasoning; (3) a comprehensive overview of performance on common benchmarks and a showcase of new, under-explored benchmarks with high potential in SWE; (4) an exploration on how core properties of code can be used to explain different reasoning techniques; and (5) gaps and potentially under-explored areas for future research.

View on arXiv

@article{ceka2025_2506.13932,
  title={ How Does LLM Reasoning Work for Code? A Survey and a Call to Action },
  author={ Ira Ceka and Saurabh Pujar and Irene Manotas and Gail Kaiser and Baishakhi Ray and Shyam Ramji },
  journal={arXiv preprint arXiv:2506.13932},
  year={ 2025 }
}

Comments on this paper