Deep Upper Confidence Bound Algorithm for Contextual Bandit Ranking of Information Selection

8 October 2021

Papers citing "Deep Upper Confidence Bound Algorithm for Contextual Bandit Ranking of Information Selection"

2 / 2 papers shown

Title
Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions Kai Xu Farid Tajaddodianfar Ben Allison 21 0 0 16 Jun 2024
Convergence Guarantees for Deep Epsilon Greedy Policy Learning Michael Rawson R. Balan 32 8 0 02 Dec 2021