Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.11442
Cited By
v1
v2 (latest)
TextArena
15 April 2025
Leon Guertler
Bobby Cheng
Simon Yu
Bo Liu
Leshem Choshen
Cheston Tan
LLMAG
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"TextArena"
11 / 11 papers shown
Title
Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study
Zhengyu Hu
Jianxun Lian
Zheyuan Xiao
Seraphina Zhang
Tianfu Wang
Nicholas Jing Yuan
Xing Xie
Hui Xiong
ELM
LRM
31
0
0
16 Jun 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
398
2,034
0
22 Jan 2025
Game-theoretic LLM: Agent Workflow for Negotiation Games
Wenyue Hua
Ollie Liu
Lingyao Li
Alfonso Amayuelas
Julie Chen
...
Lizhou Fan
Fei Sun
William Yang Wang
Xinze Wang
Yongfeng Zhang
102
22
0
08 Nov 2024
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents
Anthony Costarelli
Mat Allen
Roman Hauksson
Grace Sodunke
Suhas Hariharan
Carlson Cheng
Wenjie Li
Joshua Clymer
Arjun Yadav
ELM
ReLM
LLMAG
LRM
101
27
0
07 Jun 2024
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations
Jinhao Duan
Renming Zhang
James Diffenderfer
B. Kailkhura
Lichao Sun
Elias Stengel-Eskin
Mohit Bansal
Tianlong Chen
Kaidi Xu
ELM
LRM
110
67
0
19 Feb 2024
Evaluating Language Model Agency through Negotiations
Tim R. Davidson
V. Veselovsky
Martin Josifoski
Maxime Peyrard
Antoine Bosselut
Michal Kosinski
Robert West
LLMAG
89
29
0
09 Jan 2024
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Marwa Abdulhai
Isadora White
Charles Burton Snell
Charles Sun
Joey Hong
Yuexiang Zhai
Kelvin Xu
Sergey Levine
LLMAG
OffRL
LRM
87
42
0
30 Nov 2023
Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents
Kranti Chalamalasetti
Jana Gotze
Sherzod Hakimov
Brielen Madureira
P. Sadler
David Schlangen
ELM
ALM
LLMAG
107
36
0
22 May 2023
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
302
5,702
0
07 Jul 2021
Dynabench: Rethinking Benchmarking in NLP
Douwe Kiela
Max Bartolo
Yixin Nie
Divyansh Kaushik
Atticus Geiger
...
Pontus Stenetorp
Robin Jia
Joey Tianyi Zhou
Christopher Potts
Adina Williams
218
411
0
07 Apr 2021
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
512
4,587
0
07 Sep 2020
1