We evaluate the ability of the current generation of large language models (LLMs) to help a decision-making agent facing an exploration-exploitation tradeoff. We use LLMs to explore and exploit in silos in various (contextual) bandit tasks. We find that while the current LLMs often struggle to exploit, in-context mitigations may be used to substantially improve performance for small-scale tasks. However even then, LLMs perform worse than a simple linear regression. On the other hand, we find that LLMs do help at exploring large action spaces with inherent semantics, by suggesting suitable candidates to explore.
View on arXiv@article{harris2025_2502.00225, title={ Should You Use Your Large Language Model to Explore or Exploit? }, author={ Keegan Harris and Aleksandrs Slivkins }, journal={arXiv preprint arXiv:2502.00225}, year={ 2025 } }