ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17989
59
0
v1v2 (latest)

Outcome-based Reinforcement Learning to Predict the Future

23 May 2025
Benjamin Turtel
Danny Franklin
Kris Skotheim
Luke Hewitt
Philipp Schoenegger
    OffRLAI4TS
ArXiv (abs)PDFHTML
Main:12 Pages
4 Figures
Bibliography:2 Pages
1 Tables
Abstract

Reinforcement learning with verifiable rewards (RLVR) has boosted math and coding in large language models, yet there has been little effort to extend RLVR into messier, real-world domains like forecasting. One sticking point is that outcome-based reinforcement learning for forecasting must learn from binary, delayed, and noisy rewards, a regime where standard fine-tuning is brittle. We show that outcome-only online RL on a 14B model can match frontier-scale accuracy and surpass it in calibration and hypothetical prediction market betting by adapting two leading algorithms, Group-Relative Policy Optimisation (GRPO) and ReMax, to the forecasting setting. Our adaptations remove per-question variance scaling in GRPO, apply baseline-subtracted advantages in ReMax, hydrate training with 100k temporally consistent synthetic questions, and introduce lightweight guard-rails that penalise gibberish, non-English responses and missing rationales, enabling a single stable pass over 110k events. Scaling ReMax to 110k questions and ensembling seven predictions yields a 14B model that matches frontier baseline o1 on accuracy on our holdout set (Brier = 0.193, p = 0.23) while beating it in calibration (ECE = 0.042, p < 0.001). A simple trading rule turns this calibration edge into \127 of hypothetical profit versus \92 for o1 (p = 0.037). This demonstrates that refined RLVR methods can convert small-scale LLMs into potentially economically valuable forecasting tools, with implications for scaling this to larger models.

View on arXiv
@article{turtel2025_2505.17989,
  title={ Outcome-based Reinforcement Learning to Predict the Future },
  author={ Benjamin Turtel and Danny Franklin and Kris Skotheim and Luke Hewitt and Philipp Schoenegger },
  journal={arXiv preprint arXiv:2505.17989},
  year={ 2025 }
}
Comments on this paper