17

A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions

Zhiyin Yu
Yuchen Mou
Juncheng Yan
Junyu Luo
Chunchun Chen
Xing Wei
Yunhui Liu
Hongru Sun
Yuxing Zhang
Jun Xu
Yatao Bian
Ming Zhang
Wei Ye
Tieke He
Jie Yang
Guanjie Zheng
Zhonghai Wu
Bo Zhang
Lei Bai
Xiao Luo
Main:10 Pages
8 Figures
Bibliography:5 Pages
1 Tables
Appendix:9 Pages
Abstract

Reinforcement learning (RL) has emerged as a powerful post-training paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, reinforcement learning for LLMs faces substantial data scarcity challenges, including the limited availability of high-quality external supervision and the constrained volume of model-generated experience. These limitations make data-efficient reinforcement learning a critical research direction. In this survey, we present the first systematic review of reinforcement learning for LLMs under data scarcity. We propose a bottom-up hierarchical framework built around three complementary perspectives: the data-centric perspective, the training-centric perspective, and the framework-centric perspective. We develop a taxonomy of existing methods, summarize representative approaches in each category, and analyze their strengths and limitations. Our taxonomy aims to provide a clear conceptual foundation for understanding the design space of data-efficient RL for LLMs and to guide researchers working in this emerging area. We hope this survey offers a comprehensive roadmap for future research and inspires new directions toward more efficient and scalable reinforcement learning post-training for LLMs.

View on arXiv
Comments on this paper