Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.18836
Cited By
REALM-Bench: A Real-World Planning Benchmark for LLMs and Multi-Agent Systems
26 February 2025
Longling Geng
Edward Y. Chang
LLMAG
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"REALM-Bench: A Real-World Planning Benchmark for LLMs and Multi-Agent Systems"
8 / 8 papers shown
Title
LLM-Powered AI Agent Systems and Their Applications in Industry
Guannan Liang
Qianqian Tong
LLMAG
LM&Ro
61
3
0
22 May 2025
PLANET: A Collection of Benchmarks for Evaluating LLMs' Planning Capabilities
Haoming Li
Zhaoliang Chen
Jonathan Zhang
Fei Liu
LLMAG
126
2
0
21 Apr 2025
MACI: Multi-Agent Collaborative Intelligence for Adaptive Reasoning and Temporal Planning
Edward Y. Chang
LLMAG
107
1
0
28 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
380
1,967
0
22 Jan 2025
TaskBench: Benchmarking Large Language Models for Task Automation
Yongliang Shen
Kaitao Song
Xu Tan
Wenqi Zhang
Kan Ren
Siyu Yuan
Weiming Lu
Dongsheng Li
Yueting Zhuang
103
65
0
30 Nov 2023
TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models
Zheng Chu
Jingchang Chen
Qianglong Chen
Weijiang Yu
Haotian Wang
Ming Liu
Bing Qin
LRM
ELM
104
15
0
29 Nov 2023
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
Ge Li
Hasan Hammoud
Hani Itani
Dmitrii Khizbullin
Guohao Li
SyDa
ALM
130
513
0
31 Mar 2023
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
859
42,379
0
28 May 2020
1