Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Tasks

Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Tasks

15 May 2025

Vahid Sarhangian

Papers citing "Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Tasks"

12 / 12 papers shown

Title
A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems Zixuan Ke Fangkai Jiao Yifei Ming Xuan-Phi Nguyen Austin Xu ... Chengwei Qin Peifeng Wang Siyang Song Caiming Xiong Shafiq Joty LRM 88 15 0 12 Apr 2025
Should You Use Your Large Language Model to Explore or Exploit? Keegan Harris Aleksandrs Slivkins 45 2 0 31 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning DeepSeek-AI Daya Guo Dejian Yang Haowei Zhang Junxiao Song ... Shiyu Wang S. Yu Shunfeng Zhou Shuting Pan S.S. Li ReLM VLM OffRL AI4TS LRM 318 1,611 0 22 Jan 2025
Generalization to New Sequential Decision Making Tasks with In-Context Learning Sharath Chandra Raparthy Eric Hambro Robert Kirk Mikael Henaff Roberta Raileanu OffRL 155 22 0 06 Dec 2023
Reasoning with Language Model is Planning with World Model Shibo Hao Yi Gu Haodi Ma Joshua Jiahua Hong Zhen Wang D. Wang Zhiting Hu ReLM LRM LLMAG 123 571 0 24 May 2023
Automatic Chain of Thought Prompting in Large Language Models Zhuosheng Zhang Aston Zhang Mu Li Alexander J. Smola ReLM LRM 141 618 0 07 Oct 2022
Out of One, Many: Using Language Models to Simulate Human Samples Lisa P. Argyle Ethan C. Busby Nancy Fulda Joshua R Gubler Christopher Rytting David Wingate SyDa 77 588 0 14 Sep 2022
Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies Gati Aher RosaI. Arriaga Adam Tauman Kalai 100 390 0 18 Aug 2022
Using cognitive psychology to understand GPT-3 Marcel Binz Eric Schulz ELM LLMAG 320 474 0 21 Jun 2022
Pre-Trained Language Models for Interactive Decision-Making Shuang Li Xavier Puig Chris Paxton Yilun Du Clinton Jia Wang ... Anima Anandkumar Jacob Andreas Igor Mordatch Antonio Torralba Yuke Zhu LM&Ro 93 257 0 03 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 740 9,267 0 28 Jan 2022
Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC Aki Vehtari Andrew Gelman Jonah Gabry 106 4,044 0 16 Jul 2015