On-line Policy Improvement using Monte-Carlo Search

Neural Information Processing Systems (NeurIPS), 1996

9 January 2025

Papers citing "On-line Policy Improvement using Monte-Carlo Search"

50 / 53 papers shown

Title
Adaptive Network Security Policies via Belief Aggregation and Rollout Kim Hammar Yuchao Li Tansu Alpcan Emil C. Lupu Dimitri P. Bertsekas 161 4 0 21 Jul 2025
A Survey on Self-play Methods in Reinforcement Learning Chao Yu Zelai Xu Chengdong Ma Chao Yu Weijuan Tu ... Deheng Ye Wenbo Ding Wenbo Ding Yu Wang Yu Wang SyDa SSL OnRL 526 21 0 02 Aug 2024
Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming Dimitri Bertsekas 265 13 0 02 Jun 2024
An Approximate Dynamic Programming Framework for Occlusion-Robust Multi-Object Tracking Pratyusha Musunuru Yuchao Li Jamison Weber Dimitri P. Bertsekas 199 0 0 24 May 2024
Graph Reinforcement Learning for Combinatorial Optimization: A Survey and Unifying Perspective Victor-Alexandru Darvariu Stephen Hailes Mirco Musolesi AI4CE 246 15 0 09 Apr 2024
Tree Search in DAG Space with Model-based Reinforcement Learning for Causal Discovery Victor-Alexandru Darvariu Stephen Hailes Mirco Musolesi CML 234 3 0 20 Oct 2023
Iterative Option Discovery for Planning, by Planning Kenny Young Richard S. Sutton 299 2 0 02 Oct 2023
Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement LearningIEEE/CAA Journal of Automatica Sinica (IEEE/CAA JAS), 2023 Hongyu Ding Yuan-Yan Tang Qing Wu Bo Wang Chunlin Chen Zhi Wang 285 7 0 16 Jul 2023
The Update-Equivalence Framework for Decision-Time PlanningInternational Conference on Learning Representations (ICLR), 2023 Samuel Sokota Gabriele Farina David J. Wu Hengyuan Hu Kevin A. Wang J. Zico Kolter Noam Brown 243 5 0 25 Apr 2023
A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games Anna Winnicki R. Srikant 302 2 0 17 Mar 2023
Multiagent Rollout with Reshuffling for Warehouse Robots Path PlanningIFAC-PapersOnLine (IFAC-PapersOnLine), 2022 William Emanuelsson Alejandro Penacho Riveiros Yuchao Li Karl H. Johansson Jonas Mårtensson 191 2 0 15 Nov 2022
Nested Search versus Limited Discrepancy Search Tristan Cazenave 165 0 0 01 Oct 2022
Regret Analysis for Hierarchical Experts Bandit Problem Qihan Guo Siwei Wang Jun Zhu 214 1 0 11 Aug 2022
A Survey on Model-based Reinforcement LearningScience China Information Sciences (Sci. China Inf. Sci.), 2022 Fan Luo Tian Xu Hang Lai Xiong-Hui Chen Weinan Zhang Yang Yu OffRL LRM 264 142 0 19 Jun 2022
Learning from Drivers to Tackle the Amazon Last Mile Routing Research Challenge Chen Wu Yin Song Verdi March Eden Duthie 316 9 0 09 May 2022
Symphony: Learning Realistic and Diverse Agents for Autonomous Driving SimulationIEEE International Conference on Robotics and Automation (ICRA), 2022 Maximilian Igl Daewoo Kim Alex Kuefler Paul Mougin Punit Shah K. Shiarlis Drago Anguelov Mark Palatucci Brandyn White Shimon Whiteson 197 75 0 06 May 2022
A Dynamic Programming Algorithm for Finding an Optimal Sequence of Informative Measurements P. Loxley Ka Wai Cheung 339 4 0 24 Sep 2021
Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control Dimitri Bertsekas AI4CE 224 60 0 20 Aug 2021
Model-Based Opponent Modeling Xiaopeng Yu Jiechuan Jiang Wanpeng Zhang Haobin Jiang Zongqing Lu OffRL 230 37 0 04 Aug 2021
Train on Small, Play the Large: Scaling Up Board Games with AlphaZero and GNN Shai Ben-Assayag Ran El-Yaniv GNN 168 9 0 18 Jul 2021
Leveraging Tripartite Interaction Information from Live Stream E-Commerce for Improving Product RecommendationKnowledge Discovery and Data Mining (KDD), 2021 Sanshi Lei Yu Zhuoxuan Jiang Dongdong Chen Shanshan Feng Dongsheng Li Qi Liu Jinfeng Yi 148 29 0 07 Jun 2021
Annotating Motion Primitives for Simplifying Action Search in Reinforcement LearningIEEE Transactions on Emerging Topics in Computational Intelligence (IEEE TETCI), 2021 I. Sledge Darshan W. Bryner José C. Príncipe 294 1 0 24 Feb 2021
Monte Carlo Rollout Policy for Recommendation Systems with Dynamic User BehaviorInternational Conference on Communication Systems and Networks (COMSNETS), 2021 R. Meshram Kesav Kaza OffRL 205 3 0 08 Feb 2021
Deep Controlled Learning for Inventory ControlEuropean Journal of Operational Research (EJOR), 2020 Tarkan Temizoz Christina Imdahl R. Dijkman Douniel Lamghari-Idrissi W. Jaarsveld 390 16 0 30 Nov 2020
On the role of planning in model-based deep reinforcement learning Jessica B. Hamrick A. Friesen Feryal M. P. Behbahani A. Guez Fabio Viola Sims Witherspoon Thomas W. Anthony Lars Buesing Petar Velickovic T. Weber OffRL 305 71 0 08 Nov 2020
Lifelong Incremental Reinforcement Learning with Online Bayesian InferenceIEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2020 Zhi Wang Chunlin Chen D. Dong CLL OffRL 200 61 0 28 Jul 2020
Simulation Based Algorithms for Markov Decision Processes and Multi-Action Restless Bandits R. Meshram Kesav Kaza 246 10 0 25 Jul 2020
Model-based Reinforcement Learning: A Survey Thomas M. Moerland Joost Broekens Aske Plaat Catholijn M. Jonker OffRL 392 63 0 30 Jun 2020
A Unifying Framework for Reinforcement Learning and Planning Thomas M. Moerland Joost Broekens Aske Plaat Catholijn M. Jonker OffRL 373 10 0 26 Jun 2020
Continuous Control for Searching and Planning with a Learned Model Xuxi Yang Werner Duvaud Peng Wei 186 5 0 12 Jun 2020
Review, Analysis and Design of a Comprehensive Deep Reinforcement Learning Framework Ngoc Duy Nguyen Thanh Thi Nguyen Hai V. Nguyen Doug Creighton S. Nahavandi 283 3 0 27 Feb 2020
Constrained Multiagent Rollout and Multidimensional Assignment with the Auction Algorithm Dimitri Bertsekas 197 12 0 18 Feb 2020
Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair ProblemsIEEE Robotics and Automation Letters (RA-L), 2020 Sushmita Bhattacharya Sahil Badyal Thomas Wheeler Stephanie Gil Dimitri Bertsekas 152 37 0 11 Feb 2020
The Choice Function Framework for Online Policy ImprovementAAAI Conference on Artificial Intelligence (AAAI), 2019 Murugeswari Issakkimuthu Alan Fern Prasad Tadepalli OffRL 145 1 0 01 Oct 2019
Policy Gradient Search: Online Planning and Expert Iteration without Search Trees Thomas W. Anthony Robert Nishihara Philipp Moritz Tim Salimans John Schulman 163 30 0 07 Apr 2019
Learn a Prior for RHEA for Better Online Planning Xinyao Tong W. Liu Bin Li OffRL 220 0 0 14 Feb 2019
Learning 6-DoF Grasping and Pick-Place Using Attention Focus Marcus Gualtieri Robert Platt 224 61 0 15 Jun 2018
Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning Yonathan Efroni Gal Dalal B. Scherrer Shie Mannor OffRL 227 14 0 21 May 2018
Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations Dimitri Bertsekas OffRL 226 136 0 12 Apr 2018
Beyond the One Step Greedy Approach in Reinforcement Learning Yonathan Efroni Gal Dalal B. Scherrer Shie Mannor OffRL 241 53 0 10 Feb 2018
Learning the Reward Function for a Misspecified Model Erik Talvitie 296 11 0 29 Jan 2018
A Survey on Compiler Autotuning using Machine Learning Amir H. Ashouri W. Killian John Cavazos G. Palermo Cristina Silvano 361 227 0 13 Jan 2018
Imagination-Augmented Agents for Deep Reinforcement Learning T. Weber S. Racanière David P. Reichert Lars Buesing A. Guez ... Razvan Pascanu Peter W. Battaglia Demis Hassabis David Silver Daan Wierstra LM&Ro 202 582 0 19 Jul 2017
Multi-Labelled Value Networks for Computer GoIEEE Transactions on Games (TG), 2017 Tai-Lin Wu I-Chen Wu Guan-Wun Chen Ting Han Wei Tung-Yi Lai Hung-Chun Wu Li-Cheng Lan 159 24 0 30 May 2017
Self-Correcting Models for Model-Based Reinforcement LearningAAAI Conference on Artificial Intelligence (AAAI), 2016 Erik Talvitie LRM 255 97 0 19 Dec 2016
Approximate Policy Iteration for Budgeted Semantic Video Segmentation Behrooz Mahasseni S. Todorovic Alan Fern 132 4 0 26 Jul 2016
Using Monte Carlo Search With Data Aggregation to Improve Robot Soccer PoliciesRobot Soccer World Cup (RoboCup), 2016 Francesco Riccio Roberto Capobianco Daniele Nardi 148 4 0 01 Jun 2016
Classification-based Approximate Policy Iteration: Experiments and Extended Discussions Amir-massoud Farahmand Doina Precup André Barreto Mohammad Ghavamzadeh OffRL 158 7 0 02 Jul 2014
Analysis of Watson's Strategies for Playing Jeopardy!Journal of Artificial Intelligence Research (JAIR), 2013 Gerald Tesauro David Gondek J. Lenchner James Fan J. Prager 178 34 0 04 Feb 2014
Learning to Win by Reading Manuals in a Monte-Carlo FrameworkAnnual Meeting of the Association for Computational Linguistics (ACL), 2011 S. Branavan David Silver Regina Barzilay 159 193 0 18 Jan 2014