Semi-supervised Batch Learning From Logged Data

Semi-supervised Batch Learning From Logged Data

15 September 2022

Gholamali Aminian

Armin Behnamnia

Hamid R. Rabiee

Omar Rivasplata

Miguel R. D. Rodrigues

Papers citing "Semi-supervised Batch Learning From Logged Data"

15 / 15 papers shown

Title
Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality Ying Jin Zhimei Ren Zhuoran Yang Zhaoran Wang OffRL 81 26 0 19 Dec 2022
PAC-Bayesian Offline Contextual Bandits With Guarantees Otmane Sakhi Pierre Alquier Nicolas Chopin OffRL 99 13 0 24 Oct 2022
A Survey on Deep Semi-supervised Learning Xiangli Yang Zixing Song Irwin King Zenglin Xu 62 576 0 28 Feb 2021
Semi-supervised reward learning for offline reinforcement learning Ksenia Konyushkova Konrad Zolna Y. Aytar Alexander Novikov Scott E. Reed Serkan Cabi Nando de Freitas SSL OffRL 93 23 0 12 Dec 2020
Conservative Q-Learning for Offline Reinforcement Learning Aviral Kumar Aurick Zhou George Tucker Sergey Levine OffRL OnRL 99 1,780 0 08 Jun 2020
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog Natasha Jaques Asma Ghandeharioun J. Shen Craig Ferguson Àgata Lapedriza Noah J. Jones S. Gu Rosalind W. Picard OffRL 78 338 0 30 Jun 2019
Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy Yuan Xie Boyi Liu Qiang Liu Zhaoran Wang Yuanshuo Zhou Jian-wei Peng OffRL 34 19 0 01 Aug 2018
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms Han Xiao Kashif Rasul Roland Vollgraf 173 8,807 0 25 Aug 2017
Constrained Policy Optimization Joshua Achiam David Held Aviv Tamar Pieter Abbeel 91 1,313 0 30 May 2017
Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes Ahmed Alaa M. Schaar CML 118 300 0 10 Apr 2017
f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization Sebastian Nowozin Botond Cseke Ryota Tomioka GAN 93 1,648 0 02 Jun 2016
Learning Representations for Counterfactual Inference Fredrik D. Johansson Uri Shalit David Sontag CML OOD BDL 269 726 0 12 May 2016
Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu 181 182 0 10 Jun 2015
Doubly Robust Policy Evaluation and Optimization Miroslav Dudík D. Erhan John Langford Lihong Li OffRL 120 285 0 10 Mar 2015
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms Lihong Li Wei Chu John Langford Xuanhui Wang OffRL 150 574 0 31 Mar 2010