ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.02273
36
1

Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models

3 April 2025
Hung Le
Dai Do
D. Nguyen
Svetha Venkatesh
    OffRL
    LRM
ArXivPDFHTML
Abstract

Recent advances in fine-tuning large language models (LLMs) with reinforcement learning (RL) have shown promising improvements in complex reasoning tasks, particularly when paired with chain-of-thought (CoT) prompting. However, these successes have been largely demonstrated on large-scale models with billions of parameters, where a strong pretraining foundation ensures effective initial exploration. In contrast, RL remains challenging for tiny LLMs with 1 billion parameters or fewer because they lack the necessary pretraining strength to explore effectively, often leading to suboptimal reasoning patterns. This work introduces a novel intrinsic motivation approach that leverages episodic memory to address this challenge, improving tiny LLMs in CoT reasoning tasks. Inspired by human memory-driven learning, our method leverages successful reasoning patterns stored in memory while allowing for controlled exploration to generate novel responses. Intrinsic rewards are computed efficiently using a kNN-based episodic memory, allowing the model to discover new reasoning strategies while quickly adapting to effective past solutions. Experiments on fine-tuning GSM8K and AI-MO datasets demonstrate that our approach significantly enhances smaller LLMs' sample efficiency and generalization capability, making RL-based reasoning improvements more accessible in low-resource settings.

View on arXiv
@article{le2025_2504.02273,
  title={ Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models },
  author={ Hung Le and Dai Do and Dung Nguyen and Svetha Venkatesh },
  journal={arXiv preprint arXiv:2504.02273},
  year={ 2025 }
}
Comments on this paper