Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.12509
Cited By
Monte-Carlo Tree Search as Regularized Policy Optimization
24 July 2020
Jean-Bastien Grill
Florent Altché
Yunhao Tang
Thomas Hubert
Michal Valko
Ioannis Antonoglou
Rémi Munos
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Monte-Carlo Tree Search as Regularized Policy Optimization"
13 / 13 papers shown
Title
HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking
Runquan Gui
Zihan Wang
Jun Wang
Chi Ma
Huiling Zhen
M. Yuan
Jianye Hao
Defu Lian
Enhong Chen
Feng Wu
LRM
159
0
0
05 May 2025
Trust-Region Twisted Policy Improvement
Joery A. de Vries
Jinke He
Yaniv Oren
M. Spaan
OffRL
LRM
32
0
0
08 Apr 2025
Policy Guided Tree Search for Enhanced LLM Reasoning
Yang Li
LRM
53
0
0
04 Feb 2025
Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization
Liam Schramm
Abdeslam Boularias
25
1
0
07 Jul 2024
UniZero: Generalized and Efficient Planning with Scalable Latent World Models
Yuan Pu
Yazhe Niu
Jiyuan Ren
Zhenjie Yang
Hongsheng Li
Yu Liu
OffRL
49
1
0
15 Jun 2024
Value Improved Actor Critic Algorithms
Yaniv Oren
Moritz A. Zanger
Pascal R. van der Vaart
M. Spaan
Wendelin Bohmer
Wendelin Bohmer
OffRL
33
0
0
03 Jun 2024
Autonomous Port Navigation With Ranging Sensors Using Model-Based Reinforcement Learning
Siemen Herremans
Ali Anwar
Arne Troch
Ian Ravijts
Maarten Vangeneugden
Siegfried Mercelis
P. Hellinckx
25
1
0
17 Nov 2023
Don't throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding
Jiacheng Liu
Andrew Cohen
Ramakanth Pasunuru
Yejin Choi
Hannaneh Hajishirzi
Asli Celikyilmaz
21
24
0
26 Sep 2023
Policy-Based Self-Competition for Planning Problems
Jonathan Pirnay
Q. Göttl
Jakob Burger
D. G. Grimm
36
3
0
07 Jun 2023
Learning to design without prior data: Discovering generalizable design strategies using deep learning and tree search
Ayush Raina
Jonathan Cagan
Christopher McComb
AI4CE
25
9
0
28 Nov 2022
HyperTree Proof Search for Neural Theorem Proving
Guillaume Lample
Marie-Anne Lachaux
Thibaut Lavril
Xavier Martinet
Amaury Hayat
Gabriel Ebner
Aurelien Rodriguez
Timothée Lacroix
AIMat
28
134
0
23 May 2022
Towards an Understanding of Default Policies in Multitask Policy Optimization
Theodore H. Moskovitz
Michael Arbel
Jack Parker-Holder
Aldo Pacchiano
25
9
0
04 Nov 2021
Learning compositional programs with arguments and sampling
Giovanni De Toni
L. Erculiani
Andrea Passerini
35
3
0
01 Sep 2021
1