163
0

Neural Logistic Bandits

Abstract

We study the problem of neural logistic bandits, where the main task is to learn an unknown reward function within a logistic link function using a neural network. Existing approaches either exhibit unfavorable dependencies on κ\kappa, where 1/κ1/\kappa represents the minimum variance of reward distributions, or suffer from direct dependence on the feature dimension dd, which can be huge in neural network-based settings. In this work, we introduce a novel Bernstein-type inequality for self-normalized vector-valued martingales that is designed to bypass a direct dependence on the ambient dimension. This lets us deduce a regret upper bound that grows with the effective dimension d~\widetilde{d}, not the feature dimension, while keeping a minimal dependence on κ\kappa. Based on the concentration inequality, we propose two algorithms, NeuralLog-UCB-1 and NeuralLog-UCB-2, that guarantee regret upper bounds of order O~(d~κT)\widetilde{O}(\widetilde{d}\sqrt{\kappa T}) and O~(d~T/κ)\widetilde{O}(\widetilde{d}\sqrt{T/\kappa}), respectively, improving on the existing results. Lastly, we report numerical results on both synthetic and real datasets to validate our theoretical findings.

View on arXiv
@article{bae2025_2505.02069,
  title={ Neural Logistic Bandits },
  author={ Seoungbin Bae and Dabeen Lee },
  journal={arXiv preprint arXiv:2505.02069},
  year={ 2025 }
}
Comments on this paper