Unifying Evolutionary Prompt Search and Reinforcement Learning for LLM Self-Improvement

16 February 2026

Lunjun Zhang

Ryan Chen

Bradly C. Stadie

LLMAG

LRM

ArXiv (abs)PDF HTML Github

Main:15 Pages

22 Figures

Bibliography:5 Pages

1 Tables

Appendix:19 Pages

Abstract

Building agentic systems that can autonomously self-improve from experience is a longstanding goal of AI. Large language models (LLMs) today primarily self-improve via two mechanisms: self-reflection for context updates, and reinforcement learning (RL) for weight updates. In this work, we propose Evolutionary System Prompt Learning (E-SPL), a method for jointly improving model contexts and model weights. In each RL iteration, E-SPL samples trajectories under multiple system prompts in parallel. It applies RL updates to LLM weights conditioned on system prompts, and evolutionary updates to system prompts via mutation and crossover, two genetic operators based on LLM self-reflection. Each system prompt is assigned a TrueSkill rating for evolutionary selection, updated from relative performance within each RL iteration. E-SPL encourages a natural division between declarative knowledge encoded in prompts and procedural knowledge encoded in weights, resulting in improved performance across reasoning and agentic tasks. For instance, in an easy-to-hard (AIME $\rightarrow$ BeyondAIME) generalization setting, E-SPL improves RL success rate from 38.8% $\rightarrow$ 45.1% while also outperforming reflective prompt evolution (40.0%). Overall, our results demonstrate that RL and evolutionary prompt search are deeply synergistic, and unifying the two yields consistent gains in sample efficiency and generalization. Code: this https URL

View on arXiv

Comments on this paper