Should You Use Your Large Language Model to Explore or Exploit?

31 January 2025

Abstract

We evaluate the ability of the current generation of large language models (LLMs) to help a decision-making agent facing an exploration-exploitation tradeoff. We use LLMs to explore and exploit in silos in various (contextual) bandit tasks. We find that while the current LLMs often struggle to exploit, in-context mitigations may be used to substantially improve performance for small-scale tasks. However even then, LLMs perform worse than a simple linear regression. On the other hand, we find that LLMs do help at exploring large action spaces with inherent semantics, by suggesting suitable candidates to explore.

View on arXiv

@article{harris2025_2502.00225,
  title={ Should You Use Your Large Language Model to Explore or Exploit? },
  author={ Keegan Harris and Aleksandrs Slivkins },
  journal={arXiv preprint arXiv:2502.00225},
  year={ 2025 }
}

Comments on this paper