MARPLE: A Benchmark for Long-Horizon Inference

2 October 2024

Ruohan Zhang

Jiajun Wu

Papers citing "MARPLE: A Benchmark for Long-Horizon Inference"

3 / 3 papers shown

Title
Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and Attitudes Meng Li Michael Vrazitulis David Schlangen 45 0 0 02 Jun 2025
Breakpoint: Scalable evaluation of system-level reasoning in LLM code agents Kaivalya Hariharan Uzay Girit Atticus Wang Jacob Andreas LLMAG LRM 23 0 0 30 May 2025
CLEVRER-Humans: Describing Physical and Causal Events the Human Way Jiayuan Mao Xuelin Yang Xikun Zhang Noah D. Goodman Jiajun Wu NAI 99 22 0 05 Oct 2023