Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.24630
Cited By
The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models
30 May 2025
Junyi Li
Hwee Tou Ng
OffRL
HILM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models"
11 / 11 papers shown
Title
Kongzi: A Historical Large Language Model with Fact Enhancement
Jiashu Yang
Ningning Wang
Yian Zhao
Chaoran Feng
Junjia Du
Hao Pang
Zhirui Fang
Xuxin Cheng
HILM
ALM
LRM
65
1
0
13 Apr 2025
An Empirical Study on Eliciting and Improving R1-like Reasoning Models
Zhongfu Chen
Yingqian Min
Beichen Zhang
Jie Chen
Jinhao Jiang
...
Xu Miao
Yaojie Lu
Lei Fang
Zhongyuan Wang
Ji-Rong Wen
ReLM
OffRL
LRM
96
32
0
06 Mar 2025
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Tian Xie
Zitian Gao
Qingnan Ren
Haoming Luo
Yuqian Hong
Bryan Dai
Joey Zhou
Kai Qiu
Zhirong Wu
Chong Luo
ReLM
OffRL
LRM
117
55
0
21 Feb 2025
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
Alejandro Cuadron
Dacheng Li
Wenjie Ma
Xingyao Wang
Yichuan Wang
...
Aditya Desai
Ion Stoica
Ana Klimovic
Graham Neubig
Joseph E. Gonzalez
LRM
AI4CE
195
41
0
12 Feb 2025
LIMO: Less is More for Reasoning
Yixin Ye
Zhen Huang
Yang Xiao
Ethan Chern
Shijie Xia
Pengfei Liu
LRM
82
132
0
05 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
163
1,503
0
22 Jan 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zihao Huang
Ziyao Xu
Zhiyong Yang
Zonghan Yang
Zongyu Lin
OffRL
ALM
AI4TS
VLM
LRM
171
250
0
22 Jan 2025
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation
Neeraj Varshney
Wenlin Yao
Hongming Zhang
Jianshu Chen
Dong Yu
HILM
73
167
0
08 Jul 2023
Exploration in Deep Reinforcement Learning: A Survey
Pawel Ladosz
Lilian Weng
Minwoo Kim
H. Oh
OffRL
45
334
0
02 May 2022
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
146
5,328
0
07 Jul 2021
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
208
18,685
0
20 Jul 2017
1