Maintainable and general software allows developers to build robust applications efficiently, yet achieving these qualities often requires refactoring specialized solutions into reusable components. This challenge becomes particularly relevant as code agents become increasingly accurate at solving isolated programming problems. We investigate code agents' capacity to refactor code in ways supporting growth and reusability. We present both a method and a benchmark for refactoring: Librarian, a sample-and-rerank method for generating reusable libraries, and Minicode, a benchmark where code agents must minimize and refactor multiple independent solutions into a joint library. Compared to state-of-the-art code agents, Librarian achieves strong results on both compression and correctness on Minicode, obtaining compression rates 1.6-2x better than coding agents while also improving correctness. We open-source our code and benchmark at this https URL.
View on arXiv@article{kovacic2025_2506.11058, title={ Refactoring Codebases through Library Design }, author={ Ziga Kovacic and Celine Lee and Justin Chiu and Wenting Zhao and Kevin Ellis }, journal={arXiv preprint arXiv:2506.11058}, year={ 2025 } }