ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.16661
92
4
v1v2v3 (latest)

RLSF: Reinforcement Learning via Symbolic Feedback

26 May 2024
Piyush Jha
Prithwish Jana
Pranavkrishna Suresh
Arnav Arora
Vijay Ganesh
    LRM
ArXiv (abs)PDFHTML
Main:6 Pages
5 Figures
Bibliography:2 Pages
4 Tables
Appendix:3 Pages
Abstract

Reinforcement Learning with Human Feedback (RLHF) is considered a standard approach to fine-tuning Large Language Models (LLMs). However, such methods often face limitations such as unsound black-box reward models, difficulties in collecting human preference data, and the reliance on sparse scalar rewards. These methods often fall short when applied to tasks that require complex domain-specific understanding.

View on arXiv
@article{jha2025_2405.16661,
  title={ RLSF: Fine-tuning LLMs via Symbolic Feedback },
  author={ Piyush Jha and Prithwish Jana and Pranavkrishna Suresh and Arnav Arora and Vijay Ganesh },
  journal={arXiv preprint arXiv:2405.16661},
  year={ 2025 }
}
Comments on this paper