ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.17126
  4. Cited By
Exploring RL-based LLM Training for Formal Language Tasks with
  Programmed Rewards

Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards

22 October 2024
Alexander Padula
Dennis J. N. J. Soemers
    OffRL
ArXivPDFHTML

Papers citing "Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards"

8 / 8 papers shown
Title
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Ziyang Luo
Can Xu
Pu Zhao
Qingfeng Sun
Xiubo Geng
Wenxiang Hu
Chongyang Tao
Jing Ma
Qingwei Lin
Daxin Jiang
ELM
SyDa
ALM
79
678
0
14 Jun 2023
Understanding plasticity in neural networks
Understanding plasticity in neural networks
Clare Lyle
Zeyu Zheng
Evgenii Nikishin
Bernardo Avila-Pires
Razvan Pascanu
Will Dabney
AI4CE
97
101
0
02 Mar 2023
Execution-based Code Generation using Deep Reinforcement Learning
Execution-based Code Generation using Deep Reinforcement Learning
Parshin Shojaee
Aneesh Jain
Sindhu Tipirneni
Chandan K. Reddy
74
57
0
31 Jan 2023
Program Synthesis with Large Language Models
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
195
1,948
0
16 Aug 2021
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
230
7,498
0
02 Oct 2019
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
460
1,727
0
18 Sep 2019
Ludii -- The Ludemic General Game System
Ludii -- The Ludemic General Game System
Éric Piette
Dennis J. N. J. Soemers
Matthew Stephenson
C. F. Sironi
M. Winands
C. Browne
LLMAG
74
71
0
13 May 2019
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
478
19,019
0
20 Jul 2017
1