Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management

10 May 2019

Papers citing "Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management"

7 / 7 papers shown

Title
Optimistic Q-learning for average reward and episodic reinforcement learning Priyank Agrawal Shipra Agrawal 63 4 0 18 Jul 2024
Restarted Bayesian Online Change-point Detection for Non-Stationary Markov Decision Processes Réda Alami Mohammed Mahfoud Eric Moulines 26 2 0 01 Apr 2023
Hindsight Learning for MDPs with Exogenous Inputs Sean R. Sinclair Felipe Vieira Frujeri Ching-An Cheng Luke Marshall Hugo Barbalho Jingling Li Jennifer Neville Ishai Menache Adith Swaminathan 18 23 0 13 Jul 2022
Learning to Order for Inventory Systems with Lost Sales and Uncertain Supplies Boxiao Chen Jiashuo Jiang Jiawei Zhang Zhengyuan Zhou 28 10 0 10 Jul 2022
Learning a Discrete Set of Optimal Allocation Rules in a Queueing System with Unknown Service Rate Saghar Adler Mehrdad Moharrami V. Subramanian 46 1 0 04 Feb 2022
Learning and Information in Stochastic Networks and Queues N. Walton Kuang Xu 32 20 0 18 May 2021
Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism Wang Chi Cheung D. Simchi-Levi Ruihao Zhu OffRL 22 93 0 24 Jun 2020