ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.10400
  4. Cited By
Reinforcement Learning Enhanced LLMs: A Survey
v1v2v3 (latest)

Reinforcement Learning Enhanced LLMs: A Survey

5 December 2024
Shuhe Wang
Shengyu Zhang
Jing Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Leilei Gan
G. Wang
Eduard H. Hovy
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Reinforcement Learning Enhanced LLMs: A Survey"

12 / 12 papers shown
Title
From Emergence to Control: Probing and Modulating Self-Reflection in Language Models
From Emergence to Control: Probing and Modulating Self-Reflection in Language Models
Xudong Zhu
Jiachen Jiang
Mohammad Mahdi Khalili
Zhihui Zhu
ReLMLM&RoLRM
36
0
0
13 Jun 2025
GFRIEND: Generative Few-shot Reward Inference through EfficieNt DPO
GFRIEND: Generative Few-shot Reward Inference through EfficieNt DPO
Yiyang Zhao
Huiyu Bai
Xuejiao Zhao
OffRL
24
0
0
10 Jun 2025
Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance
Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance
Rudransh Agnihotri
Ananya Pandey
OffRLALM
59
0
0
06 Jun 2025
RACE-Align: Retrieval-Augmented and Chain-of-Thought Enhanced Preference Alignment for Large Language Models
RACE-Align: Retrieval-Augmented and Chain-of-Thought Enhanced Preference Alignment for Large Language Models
Qihang Yan
Xinyu Zhang
Luming Guo
Qi Zhang
Feifan Liu
AI4TSLRM
40
0
0
03 Jun 2025
Proxy-Free GFlowNet
Proxy-Free GFlowNet
Ruishuo Chen
Xun Wang
Rui Hu
Zhuoran Li
Longbo Huang
68
0
0
26 May 2025
Three Minds, One Legend: Jailbreak Large Reasoning Model with Adaptive Stacked Ciphers
Three Minds, One Legend: Jailbreak Large Reasoning Model with Adaptive Stacked Ciphers
Viet-Anh Nguyen
Shiqian Zhao
Gia Dao
Runyi Hu
Yi Xie
Luu Anh Tuan
AAMLLRM
95
3
0
22 May 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELMOffRLLRMAI4CE
427
4
0
26 Mar 2025
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
Yao-Ching Yu
Tsun-Han Chiang
Cheng-Wei Tsai
Chien-Ming Huang
Wen-Kwang Tsao
111
7
0
16 Feb 2025
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown
Xingzhou Lou
Dong Yan
Wei Shen
Yuzi Yan
Jian Xie
Junge Zhang
201
28
0
01 Oct 2024
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks
Amir Saeidi
Shivanshu Verma
Chitta Baral
Chitta Baral
ALM
110
26
0
23 Apr 2024
Self-Rewarding Language Models
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLMSyDaALMLRM
399
338
0
18 Jan 2024
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Ziyang Luo
Can Xu
Pu Zhao
Qingfeng Sun
Xiubo Geng
Wenxiang Hu
Chongyang Tao
Jing Ma
Qingwei Lin
Daxin Jiang
ELMSyDaALM
181
697
0
14 Jun 2023
1