ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2510.22115
20
0

Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation

25 October 2025
Ling Team
Ang Li
B. Liu
Binbin Hu
Bing Li
B. Zeng
Borui Ye
Caizhi Tang
Changxin Tian
Chao Huang
Chao Zhang
Chen Qian
Chenchen Ju
Chenchen Li
Chengfu Tang
Chili Fu
Chunshao Ren
Chunwei Wu
C. Zhang
Cunyin Peng
D. Xu
D. Wang
Dalong Zhang
dingnan jin
D. Zhu
D. Hu
F. Zhao
Feifan Wu
Feng Zhu
G. Wang
Haitao Zhang
Hailin Zhao
Hanxiao Zhang
H. Wang
Hao Qian
Haoyi Yu
Heng Zhang
H. Zhang
Hongzhi Luan
Huirong Dong
Huizhong Li
Jia-Nan Li
Jia Liu
J. Zhu
Jian Sha
Jianping Wei
Jiaolong Yang
Jieyue Ma
J. Wu
J. Huang
Jingyun Tian
J. Zhang
J. Sun
Juanhui Tu
Jun Liu
Jun Xu
Jun Zhou
Junjie Ou
Junpeng Fang
Kaihong Zhang
Kaiqin Hu
Ke Shi
Kun Tang
Kunlong Chen
Lanyin Mei
Lei Liang
Lei Xu
L. Zhang
Lin Ju
Lin Yuan
Ling Zhong
Lintao Ma
Lu Liu
Lu Yu
L. Cai
Meiqi Zhu
Mengying Li
M. Ben-Chen
Minghao Xue
Minghong Cai
Mingming Yin
Peijie Jiang
P. Zhao
Pingping Liu
Qian Zhao
Qing Cui
Qingxiang Huang
Q. Yang
Quankun Yu
Shaowei Wei
Shijie Lian
S. Zheng
Shun Song
Shungen Zhang
Shuo Zhang
Siyuan Li
Song Liu
Ting Guo
Tong Zhao
Wanli Gu
    MoEReLMALMLRMAI4CEELM
ArXiv (abs)PDFHTMLGithub (175★)
Main:45 Pages
26 Figures
Bibliography:8 Pages
10 Tables
Appendix:5 Pages
Abstract

We introduce Ling 2.0, a series reasoning-oriented language foundation built upon the principle that every activation boosts reasoning capability. Designed to scale from tens of billions to one trillion parameters under a unified Mixture-of-Experts (MoE) paradigm, Ling 2.0 emphasizes high sparsity, cross-scale consistency, and efficiency guided by empirical scaling laws. The series includes three non-thinking (instruct) models - Ling-mini-2.0, Ling-flash-2.0, and Ling-1T - ranging from 16B to 1T total parameters and achieving up to 7-fold active-compute efficiency compared with dense counterparts. Ling 2.0 integrates coordinated innovations across model architecture, pre-training, post-training, and infrastructure: a high-sparsity MoE with MTP for efficient reasoning, reasoning-oriented data and mid-training CoT activation, reinforcement-based fine-tuning (DFT, Evo-CoT), and full-scale FP8 training with fine-grained heterogeneous pipelines. At the trillion scale, Ling-1T establishes a new Pareto frontier of reasoning accuracy versus computational efficiency, demonstrating that sparse activation, when properly aligned with reasoning objectives, enables scalable and efficient intelligence. Collectively, Ling 2.0 provides a coherent, open, and efficient foundation for advancing future reasoning and thinking models, including the Ring series built upon the same base.

View on arXiv
Comments on this paper