Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.01815
Cited By
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
5 December 2017
David Silver
Thomas Hubert
Julian Schrittwieser
Ioannis Antonoglou
Matthew Lai
A. Guez
Marc Lanctot
Laurent Sifre
D. Kumaran
T. Graepel
Timothy Lillicrap
Karen Simonyan
Demis Hassabis
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm"
50 / 266 papers shown
Title
Cost-Augmented Monte Carlo Tree Search for LLM-Assisted Planning
Zihao Zhang
Fei Liu
14
0
0
20 May 2025
Explaining Strategic Decisions in Multi-Agent Reinforcement Learning for Aerial Combat Tactics
Ardian Selmonaj
Alessandro Antonucci
Adrian Schneider
Michael Rüegsegger
Matthias Sommer
27
0
0
16 May 2025
Measuring General Intelligence with Generated Games
Vivek Verma
David Huang
William Chen
Dan Klein
Nicholas Tomlin
ReLM
ELM
LM&MA
LRM
58
1
0
12 May 2025
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution
Xiaolin Huang
Weiwen Liu
Xingshan Zeng
Yanhua Huang
Xinlong Hao
...
Yirong Zeng
Chuhan Wu
Yishuo Wang
Ruiming Tang
Defu Lian
KELM
38
0
0
12 May 2025
Reinforcement Learning for Game-Theoretic Resource Allocation on Graphs
Zijian An
Lifeng Zhou
36
0
0
08 May 2025
HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking
Runquan Gui
Ziyi Wang
Jun Wang
Chi Ma
Huiling Zhen
M. Yuan
Jianye Hao
Defu Lian
Enhong Chen
Feng Wu
LRM
168
0
0
05 May 2025
Motion Generation for Food Topping Challenge 2024: Serving Salmon Roe Bowl and Picking Fried Chicken
Koki Inami
Masashi Konosu
Koki Yamane
Nozomu Masuya
Yunhan Li
Yu-Han Shu
Hiroshi Sato
Shinnosuke Homma
S. Sakaino
49
0
0
28 Apr 2025
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen
Bang Zhang
Ruotian Ma
Peisong Wang
Xiaodan Liang
Zhaopeng Tu
Xuran Li
Kwan-Yee K. Wong
LLMAG
ReLM
LRM
91
0
0
27 Apr 2025
Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning
Lynn Cherif
Flemming Kondrup
David Venuto
Ankit Anand
Doina Precup
Khimya Khetarpal
LM&Ro
54
0
0
24 Apr 2025
Improving Human-AI Coordination through Adversarial Training and Generative Models
Paresh Chaudhary
Yancheng Liang
Daphne Chen
S. Du
Natasha Jaques
73
0
0
21 Apr 2025
An Efficient Approach for Cooperative Multi-Agent Learning Problems
Ángel Aso-Mollar
Eva Onaindia
26
0
0
07 Apr 2025
Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning
Abdullah Vanlioglu
51
0
0
28 Mar 2025
Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming
Minori Narita
Ryo Kuroiwa
J. Christopher Beck
57
0
0
20 Mar 2025
ToMCAT: Theory-of-Mind for Cooperative Agents in Teams via Multiagent Diffusion Policies
Pedro Sequeira
Vidyasagar Sadhu
Melinda Gervasio
DiffM
89
0
0
25 Feb 2025
Two-Player Zero-Sum Differential Games with One-Sided Information
Mukesh Ghimire
Z. Xu
Yi Ren
SyDa
99
0
0
17 Feb 2025
Learning a Diffusion Model Policy from Rewards via Q-Score Matching
Michael Psenka
Alejandro Escontrela
Pieter Abbeel
Yi Ma
DiffM
93
24
0
17 Feb 2025
LLMs Can Teach Themselves to Better Predict the Future
Benjamin Turtel
Danny Franklin
Philipp Schoenegger
LRM
66
0
0
07 Feb 2025
Beyond Interpolation: Extrapolative Reasoning with Reinforcement Learning and Graph Neural Networks
Niccolò Grillo
Andrea Toccaceli
Joël Mathys
Benjamin Estermann
Stefania Fresca
Roger Wattenhofer
AI4CE
LRM
106
0
0
06 Feb 2025
Policy Guided Tree Search for Enhanced LLM Reasoning
Yang Li
LRM
56
0
0
04 Feb 2025
COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models
Tobias Materzok
LRM
72
0
0
28 Jan 2025
Optimizing Automatic Differentiation with Deep Reinforcement Learning
Jamie Lohoff
Emre Neftci
61
1
0
28 Jan 2025
CodeMonkeys: Scaling Test-Time Compute for Software Engineering
Ryan Ehrlich
Bradley Brown
Jordan Juravsky
Ronald Clark
Christopher Ré
Azalia Mirhoseini
57
8
0
24 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
95
1,109
0
22 Jan 2025
HEPPO: Hardware-Efficient Proximal Policy Optimization -- A Universal Pipelined Architecture for Generalized Advantage Estimation
Hazem Taha
Ameer M. S. Abdelhadi
40
1
0
22 Jan 2025
Revisiting Rogers' Paradox in the Context of Human-AI Interaction
Katherine M. Collins
Umang Bhatt
Ilia Sucholutsky
61
1
0
16 Jan 2025
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Xinyu Guan
Lefei Zhang
Yifei Liu
Ning Shang
Youran Sun
Yi Zhu
Fan Yang
Mao Yang
LRM
SyDa
ReLM
73
85
0
08 Jan 2025
ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
Chunyu Xuan
Yazhe Niu
Yuan Pu
Shuai Hu
Yu Liu
Jing Yang
73
0
0
03 Jan 2025
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
Scott Geng
Cheng-Yu Hsieh
Vivek Ramanujan
Matthew Wallingford
Chun-Liang Li
Pang Wei Koh
Ranjay Krishna
DiffM
68
7
0
03 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALM
LRM
95
236
0
03 Jan 2025
Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution
Ke Xue
Yutong Wang
Cong Guan
Lei Yuan
Haobo Fu
Qiang Fu
Chao Qian
Yang Yu
42
17
0
03 Jan 2025
Predicting Chess Puzzle Difficulty with Transformers
Szymon Miłosz
Paweł Kapusta
33
0
0
31 Dec 2024
Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps
Linfeng Zhao
Lawson L. S. Wong
84
1
0
16 Dec 2024
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
Jiayu Chen
Wentse Chen
Jeff Schneider
OffRL
40
2
0
15 Oct 2024
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
Zhong Zheng
Haochen Zhang
Lingzhou Xue
OffRL
78
2
0
10 Oct 2024
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
Cheng-rong Li
May Fung
Qingyun Wang
Chi Han
Manling Li
Jindong Wang
Heng Ji
AI4MH
236
0
0
09 Oct 2024
Scalable Signal Temporal Logic Guided Reinforcement Learning via Value Function Space Optimization
Yiting He
Peiran Liu
Yiding Ji
OffRL
38
0
0
04 Aug 2024
Reinforcement Learning for Sustainable Energy: A Survey
Koen Ponse
Felix Kleuker
Márton Fejér
Álvaro Serra-Gómez
Aske Plaat
Thomas M. Moerland
OffRL
AI4CE
45
1
0
26 Jul 2024
Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay
Gonçalo Hora de Carvalho
Oscar Knap
R. Pollice
ReLM
ELM
LRM
39
1
0
12 Jul 2024
A Review of the Applications of Deep Learning-Based Emergent Communication
Brendon Boldt
David R. Mortensen
VLM
52
6
0
03 Jul 2024
UniZero: Generalized and Efficient Planning with Scalable Latent World Models
Yuan Pu
Yazhe Niu
Jiyuan Ren
Zhenjie Yang
Hongsheng Li
Yu Liu
OffRL
54
1
0
15 Jun 2024
Open-Endedness is Essential for Artificial Superhuman Intelligence
Edward Hughes
Michael Dennis
Jack Parker-Holder
Feryal M. P. Behbahani
Aditi Mavalankar
Yuge Shi
Tom Schaul
Tim Rocktaschel
LRM
45
22
0
06 Jun 2024
Counterfactual Explanations for Multivariate Time-Series without Training Datasets
Xiangyu Sun
Raquel Aoki
Kevin H. Wilson
27
1
0
28 May 2024
Aspect-based Sentiment Evaluation of Chess Moves (ASSESS): an NLP-based Method for Evaluating Chess Strategies from Textbooks
Haifa Alrdahi
Riza Batista-Navarro
47
0
0
10 May 2024
Super-Exponential Regret for UCT, AlphaGo and Variants
Laurent Orseau
Rémi Munos
ELM
14
1
0
07 May 2024
Playing Board Games with the Predict Results of Beam Search Algorithm
Sergey Pastukhov
23
0
0
23 Apr 2024
A Survey on Self-Evolution of Large Language Models
Zhengwei Tao
Ting-En Lin
Xiancai Chen
Hangyu Li
Yuchuan Wu
Yongbin Li
Zhi Jin
Fei Huang
Dacheng Tao
Jingren Zhou
LRM
LM&Ro
62
23
0
22 Apr 2024
Mitigating Cascading Effects in Large Adversarial Graph Environments
James Cunningham
Conrad S. Tucker
AI4CE
AAML
26
0
0
12 Apr 2024
Cooperative Evolutionary Pressure and Diminishing Returns Might Explain the Fermi Paradox: On What Super-AIs Are Like
Daniel Vallstrom
31
0
0
01 Apr 2024
Understanding Iterative Combinatorial Auction Designs via Multi-Agent Reinforcement Learning
G. dÉon
N. Newman
Kevin Leyton-Brown
32
0
0
29 Feb 2024
Puzzle Solving using Reasoning of Large Language Models: A Survey
Panagiotis Giadikiaroglou
Maria Lymperaiou
Giorgos Filandrianos
Giorgos Stamou
ELM
ReLM
LRM
29
27
0
17 Feb 2024
1
2
3
4
5
6
Next