Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2506.14758
Cited By
Reasoning with Exploration: An Entropy Perspective
17 June 2025
Daixuan Cheng
Shaohan Huang
Xuekai Zhu
Bo Dai
Wayne Xin Zhao
Zhenliang Zhang
Furu Wei
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Reasoning with Exploration: An Entropy Perspective"
8 / 8 papers shown
Title
Spurious Rewards: Rethinking Training Signals in RLVR
Rulin Shao
Shuyue Stella Li
Rui Xin
Scott Geng
Yiping Wang
...
Ranjay Krishna
Yulia Tsvetkov
Hannaneh Hajishirzi
Pang Wei Koh
Luke Zettlemoyer
OffRL
ReLM
LRM
98
8
0
12 Jun 2025
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
Chen Qian
Dongrui Liu
Haochen Wen
Zhen Bai
Yong Liu
Jing Shao
LRM
52
1
0
03 Jun 2025
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
138
1,238
0
05 Feb 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
880
13,148
0
04 Mar 2022
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
233
5,635
0
07 Jul 2021
RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments
Roberta Raileanu
Tim Rocktaschel
71
174
0
27 Feb 2020
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
311
8,396
0
04 Jan 2018
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
517
19,237
0
20 Jul 2017
1