A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions

19 April 2026

Zhiyin Yu

Yuchen Mou

Juncheng Yan

Junyu Luo

Chunchun Chen

Xing Wei

Yunhui Liu

Hongru Sun

Yuxing Zhang

Jun Xu

Yatao Bian

Ming Zhang

Wei Ye

Tieke He

Jie Yang

Guanjie Zheng

Zhonghai Wu

Bo Zhang

Lei Bai

Xiao Luo

OffRL

LRM

ArXiv (abs)PDF HTML Github (5★)

Main:10 Pages

8 Figures

Bibliography:5 Pages

1 Tables

Appendix:9 Pages

Abstract

Reinforcement learning (RL) has emerged as a powerful post-training paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, reinforcement learning for LLMs faces substantial data scarcity challenges, including the limited availability of high-quality external supervision and the constrained volume of model-generated experience. These limitations make data-efficient reinforcement learning a critical research direction. In this survey, we present the first systematic review of reinforcement learning for LLMs under data scarcity. We propose a bottom-up hierarchical framework built around three complementary perspectives: the data-centric perspective, the training-centric perspective, and the framework-centric perspective. We develop a taxonomy of existing methods, summarize representative approaches in each category, and analyze their strengths and limitations. Our taxonomy aims to provide a clear conceptual foundation for understanding the design space of data-efficient RL for LLMs and to guide researchers working in this emerging area. We hope this survey offers a comprehensive roadmap for future research and inspires new directions toward more efficient and scalable reinforcement learning post-training for LLMs.

View on arXiv

Comments on this paper