ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.05767
42
0

dots.llm1 Technical Report

6 June 2025
Bi Huo
Bin Tu
Cheng Qin
Da Zheng
Debing Zhang
Dongjie Zhang
En Li
Fu Guo
Jian Yao
Jie Lou
Junfeng Tian
Li Hu
Ran Zhu
Shengdong Chen
Shuo Liu
Su Guang
Te Wo
Weijun Zhang
Xiaoming Shi
Xinxin Peng
Xing Wu
Yawen Liu
Yuqiu Ji
Ze Wen
Zhenhai Liu
Zichao Li
Zilong Liao
    MoE
ArXiv (abs)PDFHTML
Main:11 Pages
5 Figures
Bibliography:7 Pages
6 Tables
Appendix:3 Pages
Abstract

Mixture of Experts (MoE) models have emerged as a promising paradigm for scaling language models efficiently by activating only a subset of parameters for each input token. In this report, we present dots.llm1, a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models while reducing training and inference costs. Leveraging our meticulously crafted and efficient data processing pipeline, dots.llm1 achieves performance comparable to Qwen2.5-72B after pretraining on 11.2T high-quality tokens and post-training to fully unlock its capabilities. Notably, no synthetic data is used during pretraining. To foster further research, we open-source intermediate training checkpoints at every one trillion tokens, providing valuable insights into the learning dynamics of large language models.

View on arXiv
@article{huo2025_2506.05767,
  title={ dots.llm1 Technical Report },
  author={ Bi Huo and Bin Tu and Cheng Qin and Da Zheng and Debing Zhang and Dongjie Zhang and En Li and Fu Guo and Jian Yao and Jie Lou and Junfeng Tian and Li Hu and Ran Zhu and Shengdong Chen and Shuo Liu and Su Guang and Te Wo and Weijun Zhang and Xiaoming Shi and Xinxin Peng and Xing Wu and Yawen Liu and Yuqiu Ji and Ze Wen and Zhenhai Liu and Zichao Li and Zilong Liao },
  journal={arXiv preprint arXiv:2506.05767},
  year={ 2025 }
}
Comments on this paper