Neural Logistic Bandits

We study the problem of neural logistic bandits, where the main task is to learn an unknown reward function within a logistic link function using a neural network. Existing approaches either exhibit unfavorable dependencies on , where represents the minimum variance of reward distributions, or suffer from direct dependence on the feature dimension , which can be huge in neural network-based settings. In this work, we introduce a novel Bernstein-type inequality for self-normalized vector-valued martingales that is designed to bypass a direct dependence on the ambient dimension. This lets us deduce a regret upper bound that grows with the effective dimension , not the feature dimension, while keeping a minimal dependence on . Based on the concentration inequality, we propose two algorithms, NeuralLog-UCB-1 and NeuralLog-UCB-2, that guarantee regret upper bounds of order and , respectively, improving on the existing results. Lastly, we report numerical results on both synthetic and real datasets to validate our theoretical findings.
View on arXiv@article{bae2025_2505.02069, title={ Neural Logistic Bandits }, author={ Seoungbin Bae and Dabeen Lee }, journal={arXiv preprint arXiv:2505.02069}, year={ 2025 } }