Reasoning aligns language models to human cognition

9 February 2026

Gonçalo Guiomar

Elia Torre

Pehuen Moure

Victoria Shavina

Mario Giulianelli

Shih-Chii Liu

Valerio Mante

ReLM

LRM

ArXiv (abs)PDF HTML

Main:8 Pages

15 Figures

Bibliography:6 Pages

2 Tables

Appendix:25 Pages

Abstract

Do language models make decisions under uncertainty like humans do, and what role does chain-of-thought (CoT) reasoning play in the underlying decision process? We introduce an active probabilistic reasoning task that cleanly separates sampling (actively acquiring evidence) from inference (integrating evidence toward a decision). Benchmarking humans and a broad set of contemporary large language models against near-optimal reference policies reveals a consistent pattern: extended reasoning is the key determinant of strong performance, driving large gains in inference and producing belief trajectories that become strikingly human-like, while yielding only modest improvements in active sampling. To explain these differences, we fit a mechanistic model that captures systematic deviations from optimal behavior via four interpretable latent variables: memory, strategy, choice bias, and occlusion awareness. This model places humans and models in a shared low-dimensional cognitive space, reproduces behavioral signatures across agents, and shows how chain-of-thought shifts language models toward human-like regimes of evidence accumulation and belief-to-choice mapping, tightening alignment in inference while leaving a persistent gap in information acquisition.

View on arXiv

Comments on this paper