Think $^{2}$ : Grounded Metacognitive Reasoning in Large Language Models

21 February 2026

Abraham Paul Elenjical

Vivek Hruday Kavuri

Vasudeva Varma

LLMAG

ReLM

LRM

ArXiv (abs)PDF HTML Github

Main:8 Pages

4 Figures

Bibliography:1 Pages

4 Tables

Appendix:6 Pages

Abstract

Large Language Models (LLMs) demonstrate strong reasoning performance, yet their ability to reliably monitor, diagnose, and correct their own errors remains limited. We introduce a psychologically grounded metacognitive framework that operationalizes Ann Brown's regulatory cycle (Planning, Monitoring, and Evaluation) as a structured prompting architecture, and study its integration within a lightweight dual-process MetaController for adaptive effort allocation. Across diverse reasoning and diagnostic benchmarks (GSM8K, CRUXEval, MBPP, AIME, CorrectBench, and TruthfulQA) using Llama-3 and Qwen-3 (8B), explicit regulatory structuring substantially improves error diagnosis and yields a threefold increase in successful self-correction. Blinded human evaluations over 580 query pairs show an 84% aggregate preference for trustworthiness and metacognitive self-awareness over standard and Chain-of-Thought baselines. Grounding LLM reasoning in established cognitive theory offers a principled path toward more transparent and diagnostically robust AI systems.

View on arXiv

Comments on this paper

Think2^{2}2: Grounded Metacognitive Reasoning in Large Language Models

Think $^{2}$ : Grounded Metacognitive Reasoning in Large Language Models