ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.06252
54
6

Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?

8 March 2025
Kun Xiang
Zhili Liu
Zihao Jiang
Yunshuang Nie
Kaixin Cai
Yiyang Yin
Runhui Huang
Haoxiang Fan
H. Li
Weiran Huang
Yihan Zeng
Yu-Jie Yuan
J. Han
Lanqing Hong
Hang Xu
Xiaodan Liang
    ReLM
    LRM
ArXivPDFHTML
Abstract

In this paper, we address the challenging task of multimodal mathematical reasoning by incorporating the ability of "slow thinking" into multimodal large language models (MLLMs). Our core idea is that different levels of reasoning abilities can be combined dynamically to tackle questions with different complexity. To this end, we propose a paradigm of Self-structured Chain of Thought (SCoT), which is composed of minimal semantic atomic steps. Different from existing methods that rely on structured templates or free-form paradigms, our method can not only generate cognitive CoT structures for various complex tasks but also mitigates the phenomenon of overthinking. To introduce structured reasoning capabilities into visual understanding models, we further design a novel AtomThink framework with four key modules, including (i) a data engine to generate high-quality multimodal reasoning paths; (ii) a supervised fine-tuning process with serialized inference data; (iii) a policy-guided multi-turn inference method; and (iv) an atomic capability metric to evaluate the single step utilization rate. We conduct extensive experiments to show that the proposed AtomThink significantly improves the performance of baseline MLLMs, achieving more than 10\% average accuracy gains on MathVista and MathVerse. Compared to state-of-the-art structured CoT approaches, our method not only achieves higher accuracy but also improves data utilization by 5 times and boosts inference efficiency by 85.3\%. Our code is now public available inthis https URL.

View on arXiv
@article{xiang2025_2503.06252,
  title={ Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models? },
  author={ Kun Xiang and Zhili Liu and Zihao Jiang and Yunshuang Nie and Kaixin Cai and Yiyang Yin and Runhui Huang and Haoxiang Fan and Hanhui Li and Weiran Huang and Yihan Zeng and Yu-Jie Yuan and Jianhua Han and Lanqing Hong and Hang Xu and Xiaodan Liang },
  journal={arXiv preprint arXiv:2503.06252},
  year={ 2025 }
}
Comments on this paper