Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.10498
Cited By
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change
21 June 2022
Karthik Valmeekam
Matthew Marquez
Alberto Olmo
S. Sreedharan
Subbarao Kambhampati
ReLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change"
40 / 40 papers shown
Title
Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions
Dhruvesh Patel
Aishwarya Sahoo
Avinash Amballa
Tahira Naseem
Tim G. J. Rudner
Andrew McCallum
KELM
47
0
0
09 May 2025
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation
Mengze Hong
Wailing Ng
Di Jiang
Chen Zhang
ELM
53
0
0
08 May 2025
HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking
Runquan Gui
Z. Wang
J. Wang
Chi Ma
Huiling Zhen
M. Yuan
Jianye Hao
Defu Lian
Enhong Chen
Feng Wu
LRM
124
0
0
05 May 2025
SymPlanner: Deliberate Planning in Language Models with Symbolic Representation
Siheng Xiong
Jieyu Zhou
Zhangding Liu
Yusen Su
LLMAG
LM&Ro
129
0
0
02 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
1
0
26 Apr 2025
ASHiTA: Automatic Scene-grounded HIerarchical Task Analysis
Yun Chang
Leonor Fermoselle
Duy Ta
Bernadette Bucher
Luca Carlone
Jiuguang Wang
30
0
0
09 Apr 2025
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Kai Yan
Yufei Xu
Zhengyin Du
Xuesong Yao
Z. Wang
Xiaowen Guo
Jiecao Chen
ReLM
ELM
LRM
95
3
0
01 Apr 2025
EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments
Dongping Li
Tielong Cai
Tianci Tang
Wenhao Chai
Katherine Rose Driggs-Campbell
Gaoang Wang
LM&Ro
56
0
0
11 Mar 2025
RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents
Weizhe Chen
Sven Koenig
B. Dilkina
LLMAG
102
8
0
17 Feb 2025
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
Mingni Tang
Jiajia Li
Lu Yang
Zhiqiang Zhang
Jinghao Tian
Z. Li
L. Zhang
P. Wang
51
0
0
17 Feb 2025
Flaming-hot Initiation with Regular Execution Sampling for Large Language Models
Weizhe Chen
Zhicheng Zhang
Guanlin Liu
Renjie Zheng
Wenlei Shi
Chen Dun
Zheng Wu
Xing Jin
Lin Yan
ALM
LRM
51
1
0
17 Feb 2025
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
Jiacheng Ye
Jiahui Gao
Shansan Gong
Lin Zheng
Xin Jiang
Z. Li
Lingpeng Kong
DiffM
LRM
46
15
0
18 Oct 2024
Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming
Yilun Hao
Yang Zhang
Chuchu Fan
LLMAG
39
10
0
15 Oct 2024
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Xiyao Wang
Linfeng Song
Ye Tian
Dian Yu
Baolin Peng
Haitao Mi
Furong Huang
Dong Yu
LRM
47
9
0
09 Oct 2024
Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling
Jinghan Li
Zhicheng Sun
Fei Li
90
1
0
02 Oct 2024
LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner
Xiaopan Zhang
Hao Qin
Fuquan Wang
Yue Dong
Jiachen Li
LM&Ro
55
6
0
30 Sep 2024
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Zayne Sprague
Fangcong Yin
Juan Diego Rodriguez
Dongwei Jiang
Manya Wadhwa
Prasann Singhal
Xinyu Zhao
Xi Ye
Kyle Mahowald
Greg Durrett
ReLM
LRM
114
82
0
18 Sep 2024
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Mengkang Hu
Yixiao Wang
Can Xu
Lingfeng Sun
Chensheng Peng
T. Hannagan
Nicola Poerio
Saravan Rajmohan
LM&Ro
LLMAG
62
15
0
01 Aug 2024
Neuro-symbolic Training for Reasoning over Spatial Language
Tanawan Premsri
Parisa Kordjamshidi
NAI
LRM
40
6
0
19 Jun 2024
Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning
Wenjun Li
Changyu Chen
Pradeep Varakantham
47
2
0
15 Jun 2024
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming
Victor-Alexandru Pădurean
Adish Singla
ELM
46
3
0
14 Jun 2024
PDDLEGO: Iterative Planning in Textual Environments
Li Zhang
Peter Alexander Jansen
Tianyi Zhang
Peter Clark
Chris Callison-Burch
Niket Tandon
LM&Ro
24
4
0
30 May 2024
Adaptive In-conversation Team Building for Language Model Agents
Linxin Song
Jiale Liu
Jieyu Zhang
Shaokun Zhang
Ao Luo
Shijian Wang
Qingyun Wu
Chi Wang
LLMAG
63
10
0
29 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
71
41
0
23 May 2024
Chain of Thoughtlessness? An Analysis of CoT in Planning
Kaya Stechly
Karthik Valmeekam
Subbarao Kambhampati
LRM
LM&Ro
65
38
0
08 May 2024
Testing and Understanding Erroneous Planning in LLM Agents through Synthesized User Inputs
Zhenlan Ji
Daoyuan Wu
Pingchuan Ma
Zongjie Li
Shuai Wang
LLMAG
40
3
0
27 Apr 2024
Anticipate & Collab: Data-driven Task Anticipation and Knowledge-driven Planning for Human-robot Collaboration
Shivam Singh
Karthik Swaminathan
Raghav Arora
Ramandeep Singh
Ahana Datta
Dipanjan Das
Snehasis Banerjee
Mohan Sridharan
Madhava Krishna
32
0
0
04 Apr 2024
The pitfalls of next-token prediction
Gregor Bachmann
Vaishnavh Nagarajan
35
59
0
11 Mar 2024
Language Models can be Logical Solvers
Jiazhan Feng
Ruochen Xu
Junheng Hao
Hiteshi Sharma
Yelong Shen
Dongyan Zhao
Weizhu Chen
ReLM
LRM
ELM
39
22
0
10 Nov 2023
Explainable Claim Verification via Knowledge-Grounded Reasoning with Large Language Models
Haoran Wang
Kai Shu
LRM
35
22
0
08 Oct 2023
Faith and Fate: Limits of Transformers on Compositionality
Nouha Dziri
Ximing Lu
Melanie Sclar
Xiang Lorraine Li
Liwei Jian
...
Sean Welleck
Xiang Ren
Allyson Ettinger
Zaïd Harchaoui
Yejin Choi
ReLM
LRM
30
328
0
29 May 2023
Distilling Script Knowledge from Large Language Models for Constrained Language Planning
Siyu Yuan
Jiangjie Chen
Ziquan Fu
Xuyang Ge
Soham Shah
C. R. Jankowski
Yanghua Xiao
Deqing Yang
38
46
0
09 May 2023
Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling
Kolby Nottingham
Prithviraj Ammanabrolu
Alane Suhr
Yejin Choi
Hannaneh Hajishirzi
Sameer Singh
Roy Fox
LLMAG
LM&Ro
37
76
0
28 Jan 2023
The Path to Autonomous Learners
Hanna Abi Akl
14
1
0
04 Nov 2022
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
233
2,479
0
06 Oct 2022
Language models show human-like content effects on reasoning tasks
Ishita Dasgupta
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Hannah R. Sheahan
Antonia Creswell
D. Kumaran
James L. McClelland
Felix Hill
ReLM
LRM
22
180
0
14 Jul 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
307
4,084
0
24 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
308
11,915
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
332
8,457
0
28 Jan 2022
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
245
672
0
06 Jan 2021
1