Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.12348
Cited By
v1
v2 (latest)
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations
19 February 2024
Jinhao Duan
Renming Zhang
James Diffenderfer
B. Kailkhura
Lichao Sun
Elias Stengel-Eskin
Mohit Bansal
Tianlong Chen
Kaidi Xu
ELM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations"
23 / 23 papers shown
Title
TextArena
Leon Guertler
Bobby Cheng
Simon Yu
Bo Liu
Leshem Choshen
Cheston Tan
LLMAG
94
2
0
15 Apr 2025
ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition
H. A. Alyahya
Haidar Khan
Yazeed Alnumay
M Saiful Bari
B. Yener
LRM
118
2
0
10 Mar 2025
Reasoning and the Trusting Behavior of DeepSeek and GPT: An Experiment Revealing Hidden Fault Lines in Large Language Models
Rubing Li
João Sedoc
Arun Sundararajan
LRM
88
1
0
20 Feb 2025
TMGBench: A Systematic Game Benchmark for Evaluating Strategic Reasoning Abilities of LLMs
Haoran Wang
Xiachong Feng
Lei Li
Zhan Qin
Dianbo Sui
Dianbo Sui
Lingpeng Kong
LRM
ELM
83
6
0
14 Oct 2024
GLEE: A Unified Framework and Benchmark for Language-based Economic Environments
Eilam Shapira
Omer Madmon
Itamar Reinman
S. Amouyal
Roi Reichart
Moshe Tennenholtz
80
5
0
07 Oct 2024
Moral Alignment for LLM Agents
Elizaveta Tennant
Stephen Hailes
Mirco Musolesi
87
5
0
02 Oct 2024
How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
Jen-tse Huang
E. Li
Man Ho Lam
Tian Liang
Wenxuan Wang
Youliang Yuan
Wenxiang Jiao
Xing Wang
Zhaopeng Tu
Michael R. Lyu
ELM
LLMAG
135
37
0
18 Mar 2024
LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments
Junzhe Chen
Xuming Hu
Shuodi Liu
Shiyu Huang
Weijuan Tu
Zhaofeng He
Lijie Wen
ELM
LLMAG
80
11
0
26 Feb 2024
Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
Zelai Xu
Chao Yu
Fei Fang
Yu Wang
Yi Wu
LLMAG
108
91
0
29 Oct 2023
Welfare Diplomacy: Benchmarking Language Model Cooperation
Gabriel Mukobi
Hannah Erlebach
Niklas Lauffer
Lewis Hammond
Alan Chan
Jesse Clifton
LM&Ro
71
27
0
13 Oct 2023
Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf
Yuzhuang Xu
Shuo Wang
Peng Li
Ziyue Wang
Xiaolong Wang
Weidong Liu
Yang Liu
LLMAG
43
204
0
09 Sep 2023
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Maciej Besta
Nils Blach
Aleš Kubíček
Robert Gerstenberger
Michal Podstawski
...
Joanna Gajda
Tomasz Lehmann
H. Niewiadomski
Piotr Nyczyk
Torsten Hoefler
LRM
AI4CE
LM&Ro
136
671
0
18 Aug 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
356
4,388
0
09 Jun 2023
Playing repeated games with Large Language Models
Elif Akata
Lion Schulz
Julian Coda-Forno
Seong Joon Oh
Matthias Bethge
Eric Schulz
541
134
0
26 May 2023
Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents
Kranti Chalamalasetti
Jana Gotze
Sherzod Hakimov
Brielen Madureira
P. Sadler
David Schlangen
ELM
ALM
LLMAG
71
35
0
22 May 2023
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
Ge Li
Hasan Hammoud
Hani Itani
Dmitrii Khizbullin
Guohao Li
SyDa
ALM
125
498
0
31 Mar 2023
Language Models of Code are Few-Shot Commonsense Learners
Aman Madaan
Shuyan Zhou
Uri Alon
Yiming Yang
Graham Neubig
ReLM
LRM
109
221
0
13 Oct 2022
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
93
247
0
03 Oct 2022
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Shunyu Yao
Howard Chen
John Yang
Karthik Narasimhan
LLMAG
LM&Ro
142
500
0
04 Jul 2022
ScienceWorld: Is your Agent Smarter than a 5th Grader?
Ruoyao Wang
Peter Alexander Jansen
Marc-Alexandre Côté
Prithviraj Ammanabrolu
LLMAG
ReLM
LRM
100
123
0
14 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
817
9,576
0
28 Jan 2022
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
Mohit Shridhar
Xingdi Yuan
Marc-Alexandre Côté
Yonatan Bisk
Adam Trischler
Matthew J. Hausknecht
LM&Ro
LLMAG
87
433
0
08 Oct 2020
OpenSpiel: A Framework for Reinforcement Learning in Games
Marc Lanctot
Edward Lockhart
Jean-Baptiste Lespiau
V. Zambaldi
Satyaki Upadhyay
...
Julian Schrittwieser
Thomas W. Anthony
Edward Hughes
Ivo Danihelka
Jonah Ryan-Davis
OffRL
99
252
0
26 Aug 2019
1