Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning

3 June 2025

Main:3 Pages

7 Figures

3 Tables

Appendix:25 Pages

Abstract

We propose CURE, a novel reinforcement learning framework with a dedicated reward design that co-evolves coding and unit test generation capabilities based on their interaction outcomes, without any ground-truth code as supervision. This approach enables flexible and scalable training and allows the unit tester to learn directly from the coder's mistakes. Our derived ReasonFlux-Coder-7B and 14B models improve code generation accuracy by 5.3% and Best-of-N accuracy by 9.0% after optimization on Qwen2.5-Instruct models, outperforming similarly sized Qwen-Coder, DeepSeek-Coder, and Seed-Coder. They naturally extend to downstream tasks such as test-time scaling and agentic coding-achieving a 8.1% improvement over the base model. For the long-CoT model, our ReasonFlux-Coder-4B consistently outperforms Qwen3-4B while achieving 64.8% inference efficiency in unit test generation. Notably, we also find that our model can serve as an effective reward model for reinforcement learning on base models. Project:this https URL

View on arXiv

@article{wang2025_2506.03136,
  title={ Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning },
  author={ Yinjie Wang and Ling Yang and Ye Tian and Ke Shen and Mengdi Wang },
  journal={arXiv preprint arXiv:2506.03136},
  year={ 2025 }
}

Comments on this paper