ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.01815
  4. Cited By
Mastering Chess and Shogi by Self-Play with a General Reinforcement
  Learning Algorithm

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

5 December 2017
David Silver
Thomas Hubert
Julian Schrittwieser
Ioannis Antonoglou
Matthew Lai
A. Guez
Marc Lanctot
Laurent Sifre
D. Kumaran
T. Graepel
Timothy Lillicrap
Karen Simonyan
Demis Hassabis
ArXivPDFHTML

Papers citing "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm"

50 / 266 papers shown
Title
Cost-Augmented Monte Carlo Tree Search for LLM-Assisted Planning
Cost-Augmented Monte Carlo Tree Search for LLM-Assisted Planning
Zihao Zhang
Fei Liu
14
0
0
20 May 2025
Explaining Strategic Decisions in Multi-Agent Reinforcement Learning for Aerial Combat Tactics
Explaining Strategic Decisions in Multi-Agent Reinforcement Learning for Aerial Combat Tactics
Ardian Selmonaj
Alessandro Antonucci
Adrian Schneider
Michael Rüegsegger
Matthias Sommer
27
0
0
16 May 2025
Measuring General Intelligence with Generated Games
Measuring General Intelligence with Generated Games
Vivek Verma
David Huang
William Chen
Dan Klein
Nicholas Tomlin
ReLM
ELM
LM&MA
LRM
58
1
0
12 May 2025
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution
Xiaolin Huang
Weiwen Liu
Xingshan Zeng
Yanhua Huang
Xinlong Hao
...
Yirong Zeng
Chuhan Wu
Yishuo Wang
Ruiming Tang
Defu Lian
KELM
38
0
0
12 May 2025
Reinforcement Learning for Game-Theoretic Resource Allocation on Graphs
Reinforcement Learning for Game-Theoretic Resource Allocation on Graphs
Zijian An
Lifeng Zhou
36
0
0
08 May 2025
HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking
HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking
Runquan Gui
Ziyi Wang
Jun Wang
Chi Ma
Huiling Zhen
M. Yuan
Jianye Hao
Defu Lian
Enhong Chen
Feng Wu
LRM
168
0
0
05 May 2025
Motion Generation for Food Topping Challenge 2024: Serving Salmon Roe Bowl and Picking Fried Chicken
Motion Generation for Food Topping Challenge 2024: Serving Salmon Roe Bowl and Picking Fried Chicken
Koki Inami
Masashi Konosu
Koki Yamane
Nozomu Masuya
Yunhan Li
Yu-Han Shu
Hiroshi Sato
Shinnosuke Homma
S. Sakaino
49
0
0
28 Apr 2025
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen
Bang Zhang
Ruotian Ma
Peisong Wang
Xiaodan Liang
Zhaopeng Tu
Xuran Li
Kwan-Yee K. Wong
LLMAG
ReLM
LRM
91
0
0
27 Apr 2025
Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning
Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning
Lynn Cherif
Flemming Kondrup
David Venuto
Ankit Anand
Doina Precup
Khimya Khetarpal
LM&Ro
54
0
0
24 Apr 2025
Improving Human-AI Coordination through Adversarial Training and Generative Models
Improving Human-AI Coordination through Adversarial Training and Generative Models
Paresh Chaudhary
Yancheng Liang
Daphne Chen
S. Du
Natasha Jaques
73
0
0
21 Apr 2025
An Efficient Approach for Cooperative Multi-Agent Learning Problems
An Efficient Approach for Cooperative Multi-Agent Learning Problems
Ángel Aso-Mollar
Eva Onaindia
26
0
0
07 Apr 2025
Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning
Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning
Abdullah Vanlioglu
51
0
0
28 Mar 2025
Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming
Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming
Minori Narita
Ryo Kuroiwa
J. Christopher Beck
57
0
0
20 Mar 2025
ToMCAT: Theory-of-Mind for Cooperative Agents in Teams via Multiagent Diffusion Policies
ToMCAT: Theory-of-Mind for Cooperative Agents in Teams via Multiagent Diffusion Policies
Pedro Sequeira
Vidyasagar Sadhu
Melinda Gervasio
DiffM
89
0
0
25 Feb 2025
Two-Player Zero-Sum Differential Games with One-Sided Information
Two-Player Zero-Sum Differential Games with One-Sided Information
Mukesh Ghimire
Z. Xu
Yi Ren
SyDa
99
0
0
17 Feb 2025
Learning a Diffusion Model Policy from Rewards via Q-Score Matching
Learning a Diffusion Model Policy from Rewards via Q-Score Matching
Michael Psenka
Alejandro Escontrela
Pieter Abbeel
Yi Ma
DiffM
93
24
0
17 Feb 2025
LLMs Can Teach Themselves to Better Predict the Future
LLMs Can Teach Themselves to Better Predict the Future
Benjamin Turtel
Danny Franklin
Philipp Schoenegger
LRM
66
0
0
07 Feb 2025
Beyond Interpolation: Extrapolative Reasoning with Reinforcement Learning and Graph Neural Networks
Beyond Interpolation: Extrapolative Reasoning with Reinforcement Learning and Graph Neural Networks
Niccolò Grillo
Andrea Toccaceli
Joël Mathys
Benjamin Estermann
Stefania Fresca
Roger Wattenhofer
AI4CE
LRM
106
0
0
06 Feb 2025
Policy Guided Tree Search for Enhanced LLM Reasoning
Policy Guided Tree Search for Enhanced LLM Reasoning
Yang Li
LRM
56
0
0
04 Feb 2025
COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models
COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models
Tobias Materzok
LRM
72
0
0
28 Jan 2025
Optimizing Automatic Differentiation with Deep Reinforcement Learning
Optimizing Automatic Differentiation with Deep Reinforcement Learning
Jamie Lohoff
Emre Neftci
61
1
0
28 Jan 2025
CodeMonkeys: Scaling Test-Time Compute for Software Engineering
CodeMonkeys: Scaling Test-Time Compute for Software Engineering
Ryan Ehrlich
Bradley Brown
Jordan Juravsky
Ronald Clark
Christopher Ré
Azalia Mirhoseini
57
8
0
24 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
95
1,109
0
22 Jan 2025
HEPPO: Hardware-Efficient Proximal Policy Optimization -- A Universal Pipelined Architecture for Generalized Advantage Estimation
HEPPO: Hardware-Efficient Proximal Policy Optimization -- A Universal Pipelined Architecture for Generalized Advantage Estimation
Hazem Taha
Ameer M. S. Abdelhadi
40
1
0
22 Jan 2025
Revisiting Rogers' Paradox in the Context of Human-AI Interaction
Revisiting Rogers' Paradox in the Context of Human-AI Interaction
Katherine M. Collins
Umang Bhatt
Ilia Sucholutsky
61
1
0
16 Jan 2025
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Xinyu Guan
Lefei Zhang
Yifei Liu
Ning Shang
Youran Sun
Yi Zhu
Fan Yang
Mao Yang
LRM
SyDa
ReLM
73
85
0
08 Jan 2025
ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
Chunyu Xuan
Yazhe Niu
Yuan Pu
Shuai Hu
Yu Liu
Jing Yang
73
0
0
03 Jan 2025
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
Scott Geng
Cheng-Yu Hsieh
Vivek Ramanujan
Matthew Wallingford
Chun-Liang Li
Pang Wei Koh
Ranjay Krishna
DiffM
68
7
0
03 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALM
LRM
95
236
0
03 Jan 2025
Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution
Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution
Ke Xue
Yutong Wang
Cong Guan
Lei Yuan
Haobo Fu
Qiang Fu
Chao Qian
Yang Yu
42
17
0
03 Jan 2025
Predicting Chess Puzzle Difficulty with Transformers
Predicting Chess Puzzle Difficulty with Transformers
Szymon Miłosz
Paweł Kapusta
33
0
0
31 Dec 2024
Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down
  Maps
Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps
Linfeng Zhao
Lawson L. S. Wong
84
1
0
16 Dec 2024
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
Jiayu Chen
Wentse Chen
Jeff Schneider
OffRL
40
2
0
15 Oct 2024
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
Zhong Zheng
Haochen Zhang
Lingzhou Xue
OffRL
78
2
0
10 Oct 2024
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
Cheng-rong Li
May Fung
Qingyun Wang
Chi Han
Manling Li
Jindong Wang
Heng Ji
AI4MH
236
0
0
09 Oct 2024
Scalable Signal Temporal Logic Guided Reinforcement Learning via Value
  Function Space Optimization
Scalable Signal Temporal Logic Guided Reinforcement Learning via Value Function Space Optimization
Yiting He
Peiran Liu
Yiding Ji
OffRL
38
0
0
04 Aug 2024
Reinforcement Learning for Sustainable Energy: A Survey
Reinforcement Learning for Sustainable Energy: A Survey
Koen Ponse
Felix Kleuker
Márton Fejér
Álvaro Serra-Gómez
Aske Plaat
Thomas M. Moerland
OffRL
AI4CE
45
1
0
26 Jul 2024
Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay
Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay
Gonçalo Hora de Carvalho
Oscar Knap
R. Pollice
ReLM
ELM
LRM
39
1
0
12 Jul 2024
A Review of the Applications of Deep Learning-Based Emergent
  Communication
A Review of the Applications of Deep Learning-Based Emergent Communication
Brendon Boldt
David R. Mortensen
VLM
52
6
0
03 Jul 2024
UniZero: Generalized and Efficient Planning with Scalable Latent World Models
UniZero: Generalized and Efficient Planning with Scalable Latent World Models
Yuan Pu
Yazhe Niu
Jiyuan Ren
Zhenjie Yang
Hongsheng Li
Yu Liu
OffRL
54
1
0
15 Jun 2024
Open-Endedness is Essential for Artificial Superhuman Intelligence
Open-Endedness is Essential for Artificial Superhuman Intelligence
Edward Hughes
Michael Dennis
Jack Parker-Holder
Feryal M. P. Behbahani
Aditi Mavalankar
Yuge Shi
Tom Schaul
Tim Rocktaschel
LRM
45
22
0
06 Jun 2024
Counterfactual Explanations for Multivariate Time-Series without
  Training Datasets
Counterfactual Explanations for Multivariate Time-Series without Training Datasets
Xiangyu Sun
Raquel Aoki
Kevin H. Wilson
27
1
0
28 May 2024
Aspect-based Sentiment Evaluation of Chess Moves (ASSESS): an NLP-based
  Method for Evaluating Chess Strategies from Textbooks
Aspect-based Sentiment Evaluation of Chess Moves (ASSESS): an NLP-based Method for Evaluating Chess Strategies from Textbooks
Haifa Alrdahi
Riza Batista-Navarro
47
0
0
10 May 2024
Super-Exponential Regret for UCT, AlphaGo and Variants
Super-Exponential Regret for UCT, AlphaGo and Variants
Laurent Orseau
Rémi Munos
ELM
14
1
0
07 May 2024
Playing Board Games with the Predict Results of Beam Search Algorithm
Playing Board Games with the Predict Results of Beam Search Algorithm
Sergey Pastukhov
23
0
0
23 Apr 2024
A Survey on Self-Evolution of Large Language Models
A Survey on Self-Evolution of Large Language Models
Zhengwei Tao
Ting-En Lin
Xiancai Chen
Hangyu Li
Yuchuan Wu
Yongbin Li
Zhi Jin
Fei Huang
Dacheng Tao
Jingren Zhou
LRM
LM&Ro
62
23
0
22 Apr 2024
Mitigating Cascading Effects in Large Adversarial Graph Environments
Mitigating Cascading Effects in Large Adversarial Graph Environments
James Cunningham
Conrad S. Tucker
AI4CE
AAML
26
0
0
12 Apr 2024
Cooperative Evolutionary Pressure and Diminishing Returns Might Explain
  the Fermi Paradox: On What Super-AIs Are Like
Cooperative Evolutionary Pressure and Diminishing Returns Might Explain the Fermi Paradox: On What Super-AIs Are Like
Daniel Vallstrom
31
0
0
01 Apr 2024
Understanding Iterative Combinatorial Auction Designs via Multi-Agent
  Reinforcement Learning
Understanding Iterative Combinatorial Auction Designs via Multi-Agent Reinforcement Learning
G. dÉon
N. Newman
Kevin Leyton-Brown
32
0
0
29 Feb 2024
Puzzle Solving using Reasoning of Large Language Models: A Survey
Puzzle Solving using Reasoning of Large Language Models: A Survey
Panagiotis Giadikiaroglou
Maria Lymperaiou
Giorgos Filandrianos
Giorgos Stamou
ELM
ReLM
LRM
29
27
0
17 Feb 2024
123456
Next