Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.17621
Cited By
Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration
23 May 2025
Jingtong Gao
Ling Pan
Yejing Wang
Rui Zhong
Chi Lu
Qingpeng Cai
Peng Jiang
Xiangyu Zhao
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration"
30 / 30 papers shown
Title
Improving RL Exploration for LLM Reasoning through Retrospective Replay
Shihan Dou
Muling Wu
Jingwen Xu
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
OffRL
LRM
45
1
0
19 Apr 2025
Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu
Changyu Chen
Wenjun Li
Penghui Qi
Tianyu Pang
Chao Du
Wee Sun Lee
Min Lin
OffRL
LRM
92
108
0
26 Mar 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zheng Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRL
LRM
100
131
0
18 Mar 2025
Steering Large Language Model Activations in Sparse Spaces
Reza Bayat
Ali Rahimi-Kalahroudi
Mohammad Pezeshki
Sarath Chandar
Pascal Vincent
LLMSV
48
4
0
28 Feb 2025
On Designing Effective RL Reward at Training Time for LLM Reasoning
Jiaxuan Gao
Shusheng Xu
Wenjie Ye
Weilin Liu
Chuyi He
Wei Fu
Zhiyu Mei
Guangju Wang
Yi Wu
OffRL
LRM
62
21
0
19 Oct 2024
Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation
Yuxuan Zhou
Margret Keuper
Mario Fritz
56
6
0
24 Aug 2024
Reasoning with Large Language Models, a Survey
Aske Plaat
Annie Wong
Suzan Verberne
Joost Broekens
Niki van Stein
Thomas Back
OffRL
LRM
AI4CE
ReLM
26
61
0
16 Jul 2024
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
Yuxi Xie
Anirudh Goyal
Wenyue Zheng
Min-Yen Kan
Timothy Lillicrap
Kenji Kawaguchi
Michael Shieh
ReLM
LRM
70
104
0
01 May 2024
Stream of Search (SoS): Learning to Search in Language
Kanishk Gandhi
Denise Lee
Gabriel Grand
Muxin Liu
Winson Cheng
Archit Sharma
Noah D. Goodman
RALM
AIFin
LRM
61
54
0
01 Apr 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
75
953
0
05 Feb 2024
A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications
Pranab Sahoo
Ayush Kumar Singh
Sriparna Saha
Vinija Jain
S. Mondal
Aman Chadha
88
296
0
05 Feb 2024
Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation
Meng Cao
Lei Shu
Lei Yu
Yun Zhu
Nevan Wichers
Yinxiao Liu
Lei Meng
OffRL
ALM
38
5
0
14 Jan 2024
Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review
Banghao Chen
Zhaofeng Zhang
Nicolas Langrené
Shengxin Zhu
LLMAG
59
89
0
23 Oct 2023
Towards Better Chain-of-Thought Prompting Strategies: A Survey
Zihan Yu
Liang He
Zhen Wu
Xinyu Dai
Jiajun Chen
LRM
143
51
0
08 Oct 2023
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
Zheng Chu
Jingchang Chen
Qianglong Chen
Weijiang Yu
Tao He
Haotian Wang
Weihua Peng
Ming-Yuan Liu
Bing Qin
Ting Liu
LRM
AI4CE
69
166
0
27 Sep 2023
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
98
1,044
0
31 May 2023
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Tian Liang
Zhiwei He
Wenxiang Jiao
Xing Wang
Rui Wang
Yujiu Yang
Zhaopeng Tu
Shuming Shi
LLMAG
LRM
53
438
0
30 May 2023
Self-Evaluation Guided Beam Search for Reasoning
Yuxi Xie
Kenji Kawaguchi
Yiran Zhao
Xu Zhao
MingSung Kan
Junxian He
Qizhe Xie
LRM
183
145
0
01 May 2023
Self-Refine: Iterative Refinement with Self-Feedback
Aman Madaan
Niket Tandon
Prakhar Gupta
Skyler Hallinan
Luyu Gao
...
Bodhisattwa Prasad Majumder
Katherine Hermann
Sean Welleck
Amir Yazdanbakhsh
Peter Clark
ReLM
LRM
DiffM
101
1,577
0
30 Mar 2023
Solving math word problems with process- and outcome-based feedback
J. Uesato
Nate Kushman
Ramana Kumar
Francis Song
Noah Y. Siegel
L. Wang
Antonia Creswell
G. Irving
I. Higgins
FaML
ReLM
AIMat
LRM
66
316
0
25 Nov 2022
Exploration in Deep Reinforcement Learning: A Survey
Pawel Ladosz
Lilian Weng
Minwoo Kim
H. Oh
OffRL
45
334
0
02 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
463
3,486
0
21 Mar 2022
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey
Bonan Min
Hayley L Ross
Elior Sulem
Amir Pouran Ben Veyseh
Thien Huu Nguyen
Oscar Sainz
Eneko Agirre
Ilana Heinz
Dan Roth
LM&MA
VLM
AI4CE
95
1,055
0
01 Nov 2021
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
183
4,175
0
27 Oct 2021
Exploration by Random Network Distillation
Yuri Burda
Harrison Edwards
Amos Storkey
Oleg Klimov
88
1,310
0
30 Oct 2018
A Brief Survey of Deep Reinforcement Learning
Kai Arulkumaran
M. Deisenroth
Miles Brundage
Anil Anthony Bharath
OffRL
96
2,792
0
19 Aug 2017
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
208
18,685
0
20 Jul 2017
Curiosity-driven Exploration by Self-supervised Prediction
Deepak Pathak
Pulkit Agrawal
Alexei A. Efros
Trevor Darrell
LRM
SSL
96
2,423
0
15 May 2017
Count-Based Exploration with Neural Density Models
Georg Ostrovski
Marc G. Bellemare
Aaron van den Oord
Rémi Munos
74
616
0
03 Mar 2017
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
38
3,368
0
08 Jun 2015
1