PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning

9 January 2026

Jingcheng Hu

Yinmin Zhang

Shijie Shang

Xiaobo Yang

Yue Peng

Zhewei Huang

Hebin Zhou

Xin Wu

Jie Cheng

Fanqi Wan

Xiangwen Kong

Chengyuan Yao

Kaiwen Yan

Ailin Huang

Hongyu Zhou

Qi Han

Zheng Ge

Daxin Jiang

Xiangyu Zhang

Heung-Yeung Shum

ReLM

LRM

ArXiv (abs)PDF HTML HuggingFace (76 upvotes)Github (250★)

Main:13 Pages

5 Figures

Bibliography:5 Pages

8 Tables

Appendix:4 Pages

Abstract

We introduce Parallel Coordinated Reasoning (PaCoRe), a training-and-inference framework designed to overcome a central limitation of contemporary language models: their inability to scale test-time compute (TTC) far beyond sequential reasoning under a fixed context window. PaCoRe departs from the traditional sequential paradigm by driving TTC through massive parallel exploration coordinated via a message-passing architecture in multiple rounds. Each round launches many parallel reasoning trajectories, compacts their findings into context-bounded messages, and synthesizes these messages to guide the next round and ultimately produce the final answer. Trained end-to-end with large-scale, outcome-based reinforcement learning, the model masters the synthesis abilities required by PaCoRe and scales to multi-million-token effective TTC without exceeding context limits. The approach yields strong improvements across diverse domains, and notably pushes reasoning beyond frontier systems in mathematics: an 8B model reaches 94.5% on HMMT 2025, surpassing GPT-5's 93.2% by scaling effective TTC to roughly two million tokens. We open-source model checkpoints, training data, and the full inference pipeline to accelerate follow-up work.

View on arXiv

Comments on this paper