Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2412.10400
Cited By
v1
v2
v3 (latest)
Reinforcement Learning Enhanced LLMs: A Survey
5 December 2024
Shuhe Wang
Shengyu Zhang
Jing Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Leilei Gan
G. Wang
Eduard H. Hovy
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Reinforcement Learning Enhanced LLMs: A Survey"
12 / 12 papers shown
Title
From Emergence to Control: Probing and Modulating Self-Reflection in Language Models
Xudong Zhu
Jiachen Jiang
Mohammad Mahdi Khalili
Zhihui Zhu
ReLM
LM&Ro
LRM
36
0
0
13 Jun 2025
GFRIEND: Generative Few-shot Reward Inference through EfficieNt DPO
Yiyang Zhao
Huiyu Bai
Xuejiao Zhao
OffRL
24
0
0
10 Jun 2025
Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance
Rudransh Agnihotri
Ananya Pandey
OffRL
ALM
59
0
0
06 Jun 2025
RACE-Align: Retrieval-Augmented and Chain-of-Thought Enhanced Preference Alignment for Large Language Models
Qihang Yan
Xinyu Zhang
Luming Guo
Qi Zhang
Feifan Liu
AI4TS
LRM
40
0
0
03 Jun 2025
Proxy-Free GFlowNet
Ruishuo Chen
Xun Wang
Rui Hu
Zhuoran Li
Longbo Huang
68
0
0
26 May 2025
Three Minds, One Legend: Jailbreak Large Reasoning Model with Adaptive Stacked Ciphers
Viet-Anh Nguyen
Shiqian Zhao
Gia Dao
Runyi Hu
Yi Xie
Luu Anh Tuan
AAML
LRM
95
3
0
22 May 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
427
4
0
26 Mar 2025
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
Yao-Ching Yu
Tsun-Han Chiang
Cheng-Wei Tsai
Chien-Ming Huang
Wen-Kwang Tsao
111
7
0
16 Feb 2025
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown
Xingzhou Lou
Dong Yan
Wei Shen
Yuzi Yan
Jian Xie
Junge Zhang
201
28
0
01 Oct 2024
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks
Amir Saeidi
Shivanshu Verma
Chitta Baral
Chitta Baral
ALM
110
26
0
23 Apr 2024
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLM
SyDa
ALM
LRM
399
338
0
18 Jan 2024
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Ziyang Luo
Can Xu
Pu Zhao
Qingfeng Sun
Xiubo Geng
Wenxiang Hu
Chongyang Tao
Jing Ma
Qingwei Lin
Daxin Jiang
ELM
SyDa
ALM
181
697
0
14 Jun 2023
1