JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models

7 December 2025

Ce Chi

Xing Wang

Zhendong Wang

Xiaofan Liu

Ce Li

Zhiyan Song

Chen Zhao

Kexin Yang

Boshen Shi

Jingjing Yang

Chao Deng

Junlan Feng

LMTD

LRM

ArXiv (abs)PDF HTML Github (4★)

Main:23 Pages

9 Figures

Bibliography:4 Pages

17 Tables

Appendix:1 Pages

Abstract

In this work, we present JT-DA-8B (JiuTian Data Analyst 8B), a specialized large language model designed for complex table reasoning tasks across diverse real-world scenarios. To address the lack of high-quality supervision in tabular reasoning scenarios, we construct a comprehensive and diverse training corpus with 34 well-defined table reasoning tasks, by aggregating 29 public table QA datasets and 3 million tables. An automatic pipeline is proposed to generate realistic multi-step analytical tasks involving reasoning patterns. The model is trained upon open-source JT-Coder-8B model, an 8B-parameter decoder-only foundation model trained from scratch. In the training stage, we leverage LLM-based scoring and workflow-aligned filtering to distill high-quality, table-centric data. Both supervised fine-tuning (SFT) and Reinforcement learning (RL) are adopted to optimize our model. Afterwards, a four-stage table reasoning workflow is proposed, including table preprocessing, table sensing, tool-integrated reasoning, and prompt engineering, to improve model interpretability and execution accuracy. Experimental results show that JT-DA-8B achieves strong performance in various table reasoning tasks, demonstrating the effectiveness of data-centric generation and workflow-driven optimization.

View on arXiv

Comments on this paper