Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.01815
Cited By
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
5 December 2017
David Silver
Thomas Hubert
Julian Schrittwieser
Ioannis Antonoglou
Matthew Lai
A. Guez
Marc Lanctot
Laurent Sifre
D. Kumaran
T. Graepel
Timothy Lillicrap
Karen Simonyan
Demis Hassabis
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm"
50 / 266 papers shown
Title
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi Fan
Guanzhi Wang
Yunfan Jiang
Ajay Mandlekar
Yuncong Yang
Haoyi Zhu
Andrew Tang
De-An Huang
Yuke Zhu
Anima Anandkumar
LM&Ro
69
352
0
17 Jun 2022
Rapid Learning of Spatial Representations for Goal-Directed Navigation Based on a Novel Model of Hippocampal Place Fields
Adedapo Alabi
D. Vanderelst
A. Minai
17
2
0
05 Jun 2022
Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search
Michał Zawalski
Michał Tyrolski
K. Czechowski
Tomasz Odrzygó'zd'z
Damian Stachura
Piotr Pikekos
Yuhuai Wu
Lukasz Kuciñski
Piotr Milo's
LRM
21
9
0
01 Jun 2022
HyperTree Proof Search for Neural Theorem Proving
Guillaume Lample
Marie-Anne Lachaux
Thibaut Lavril
Xavier Martinet
Amaury Hayat
Gabriel Ebner
Aurelien Rodriguez
Timothée Lacroix
AIMat
41
138
0
23 May 2022
Chain of Thought Imitation with Procedure Cloning
Mengjiao Yang
Dale Schuurmans
Pieter Abbeel
Ofir Nachum
OffRL
35
30
0
22 May 2022
Adversarial Training for High-Stakes Reliability
Daniel M. Ziegler
Seraphina Nix
Lawrence Chan
Tim Bauman
Peter Schmidt-Nielsen
...
Noa Nabeshima
Benjamin Weinstein-Raun
D. Haas
Buck Shlegeris
Nate Thomas
AAML
38
59
0
03 May 2022
Graph Neural Network based Agent in Google Research Football
Yizhan Niu
Jinglong Liu
Yuhao Shi
Jiren Zhu
GNN
27
2
0
23 Apr 2022
Adversarial Learning to Reason in an Arbitrary Logic
Stanislaw J. Purgal
C. Kaliszyk
27
1
0
06 Apr 2022
PerfectDou: Dominating DouDizhu with Perfect Information Distillation
Yang Guan
Minghuan Liu
Weijun Hong
Weinan Zhang
Fei Fang
Guangjun Zeng
Yue Lin
33
26
0
30 Mar 2022
Remember and Forget Experience Replay for Multi-Agent Reinforcement Learning
Pascal Weber
Daniel Wälchli
Mustafa Zeqiri
Petros Koumoutsakos
CLL
OffRL
21
7
0
24 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
402
12,150
0
04 Mar 2022
Using Deep Reinforcement Learning with Automatic Curriculum Learning for Mapless Navigation in Intralogistics
Honghu Xue
Benedikt Hein
M. Bakr
Georg Schildbach
Bengt Abel
Elmar Rueckert
16
15
0
23 Feb 2022
Open-Ended Reinforcement Learning with Neural Reward Functions
Robert Meier
Asier Mujika
37
7
0
16 Feb 2022
Compute Trends Across Three Eras of Machine Learning
J. Sevilla
Lennart Heim
A. Ho
T. Besiroglu
Marius Hobbhahn
Pablo Villalobos
39
272
0
11 Feb 2022
Uncovering Instabilities in Variational-Quantum Deep Q-Networks
Maja Franz
Lucas Wolf
Maniraman Periyasamy
Christian Ufrecht
Daniel D. Scherer
Axel Plinge
Christopher Mutschler
Wolfgang Mauerer
36
29
0
10 Feb 2022
Formal Mathematics Statement Curriculum Learning
Stanislas Polu
Jesse Michael Han
Kunhao Zheng
Mantas Baksys
Igor Babuschkin
Ilya Sutskever
AIMat
91
118
0
03 Feb 2022
Modified DDPG car-following model with a real-world human driving experience with CARLA simulator
Dian-Tao Li
Ostap Okhrin
43
37
0
29 Dec 2021
Safe Reinforcement Learning with Chance-constrained Model Predictive Control
Samuel Pfrommer
Tanmay Gautam
Alec Zhou
Somayeh Sojoudi
21
24
0
27 Dec 2021
Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination
Rui Zhao
Jinming Song
Yufeng Yuan
Haifeng Hu
Yang Gao
Yi Wu
Zhongqian Sun
Yang Wei
32
63
0
22 Dec 2021
Learning to track environment state via predictive autoencoding
Marian Andrecki
N. K. Taylor
10
0
0
14 Dec 2021
Recent Advances in Reinforcement Learning in Finance
B. Hambly
Renyuan Xu
Huining Yang
OffRL
29
168
0
08 Dec 2021
Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL
Charles Packer
Pieter Abbeel
Joseph E. Gonzalez
OffRL
29
18
0
02 Dec 2021
CubeTR: Learning to Solve The Rubiks Cube Using Transformers
Mustafa Chasmai
ViT
37
1
0
11 Nov 2021
AlphaD3M: Machine Learning Pipeline Synthesis
Iddo Drori
Yamuna Krishnamurthy
Rémi Rampin
Raoni Lourenço
Jorge Piazentin Ono
Kyunghyun Cho
Claudio Silva
J. Freire
33
85
0
03 Nov 2021
Adaptive Discretization in Online Reinforcement Learning
Sean R. Sinclair
Siddhartha Banerjee
Chao Yu
OffRL
45
15
0
29 Oct 2021
Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment
Tung M. Luu
Chang D. Yoo
23
8
0
28 Oct 2021
Measuring the Non-Transitivity in Chess
R. Sanjaya
Jun Wang
Yaodong Yang
21
22
0
22 Oct 2021
CORA: Benchmarks, Baselines, and Metrics as a Platform for Continual Reinforcement Learning Agents
Sam Powers
Eliot Xing
Eric Kolve
Roozbeh Mottaghi
Abhinav Gupta
OffRL
36
38
0
19 Oct 2021
In a Nutshell, the Human Asked for This: Latent Goals for Following Temporal Specifications
Borja G. Leon
Murray Shanahan
Francesco Belardinelli
AI4CE
28
15
0
18 Oct 2021
Learning Pessimism for Robust and Efficient Off-Policy Reinforcement Learning
Edoardo Cetin
Oya Celiktutan
OffRL
47
17
0
07 Oct 2021
Deep Synoptic Monte Carlo Planning in Reconnaissance Blind Chess
Gregory Clark
38
9
0
05 Oct 2021
A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances
Dieqiao Feng
Carla P. Gomes
B. Selman
ODL
25
18
0
03 Oct 2021
Reinforcement Learning with Information-Theoretic Actuation
Elliot Catt
Marcus Hutter
J. Veness
45
0
0
30 Sep 2021
Deep Reinforcement Learning with Adjustments
H. Khorasgani
Haiyan Wang
Chetan Gupta
Susumu Serita
23
2
0
28 Sep 2021
The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation
Anna Winnicki
Joseph Lubars
Michael Livesay
R. Srikant
31
3
0
28 Sep 2021
Learning General Optimal Policies with Graph Neural Networks: Expressive Power, Transparency, and Limits
Simon Ståhlberg
Blai Bonet
Hector Geffner
41
48
0
21 Sep 2021
Target Languages (vs. Inductive Biases) for Learning to Act and Plan
Hector Geffner
42
6
0
15 Sep 2021
On Solving a Stochastic Shortest-Path Markov Decision Process as Probabilistic Inference
Mohamed Baioumy
Bruno Lacerda
Paul Duckworth
Nick Hawes
35
3
0
13 Sep 2021
Explaining Bayesian Neural Networks
Kirill Bykov
Marina M.-C. Höhne
Adelaida Creosteanu
Klaus-Robert Muller
Frederick Klauschen
Shinichi Nakajima
Marius Kloft
BDL
AAML
36
25
0
23 Aug 2021
Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control
Dimitri Bertsekas
AI4CE
56
55
0
20 Aug 2021
Train on Small, Play the Large: Scaling Up Board Games with AlphaZero and GNN
Shai Ben-Assayag
Ran El-Yaniv
GNN
36
9
0
18 Jul 2021
Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction
Assaf Hallak
Gal Dalal
Steven Dalton
I. Frosio
Shie Mannor
Gal Chechik
OffRL
OnRL
35
9
0
04 Jul 2021
Augmented Shortcuts for Vision Transformers
Yehui Tang
Kai Han
Chang Xu
An Xiao
Yiping Deng
Chao Xu
Yunhe Wang
ViT
19
39
0
30 Jun 2021
Continuous Control with Deep Reinforcement Learning for Autonomous Vessels
Nader Zare
Bruno Brandoli
Mahtab Sarvmaili
Amílcar Soares
Stan Matwin
19
8
0
27 Jun 2021
Policy Smoothing for Provably Robust Reinforcement Learning
Aounon Kumar
Alexander Levine
S. Feizi
AAML
20
56
0
21 Jun 2021
Graceful Degradation and Related Fields
J. Dymond
33
4
0
21 Jun 2021
Communicating Natural Programs to Humans and Machines
Samuel Acquaviva
Yewen Pu
Marta Kryven
Theo Sechopoulos
Catherine Wong
Gabrielle Ecanow
Maxwell Nye
Michael Henry Tessler
J. Tenenbaum
38
40
0
15 Jun 2021
Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks
Avi Schwarzschild
Eitan Borgnia
Arjun Gupta
Furong Huang
U. Vishkin
Micah Goldblum
Tom Goldstein
24
74
0
08 Jun 2021
Emergent Prosociality in Multi-Agent Games Through Gifting
Woodrow Z. Wang
M. Beliaev
Erdem Biyik
Daniel A. Lazar
Ramtin Pedarsani
Dorsa Sadigh
AI4CE
22
25
0
13 May 2021
Hierarchical RNNs-Based Transformers MADDPG for Mixed Cooperative-Competitive Environments
Xiaolong Wei
Lifang Yang
Xianglin Huang
Gang Cao
Zhulin Tao
Zhengyang Du
Jing An
34
6
0
11 May 2021
Previous
1
2
3
4
5
6
Next