80

LongCat-Flash Technical Report

Shuo Wang
Suogui Dang
Tao Fang
Tao Li
Tefeng Chen
Tianhao Bai
Tianhao Zhou
Tingwen Xie
Wei He
Wei Huang
Wei Liu
Wei Shi
Wei Wang
Wei Wu
Weikang Zhao
Wen Zan
Wenjie Shi
Xi Nan
Xi Su
Xiang Li
Xiang Mei
Xiangyang Ji
Xiangyu Xi
Xiangzhou Huang
Xianpeng Li
Xiao Fu
Xiao Liu
Xiao Wei
Xiaodong Cai
Xiaolong Chen
Xiaoqing Liu
Xiaotong Li
Xiaowei Shi
Xiaoyu Li
Xili Wang
Xin Chen
Xing Hu
Xingyu Miao
Xinyan He
Xuemiao Zhang
Xueyuan Hao
Xuezhi Cao
Xunliang Cai
Xurui Yang
Yan Feng
Yang Bai
Yang Chen
Yang Yang
Yaqi Huo
Yerui Sun
Yifan Lu
Yifan Zhang
Yipeng Zang
Yitao Zhai
Yiyang Li
Yongjing Yin
Yongkang Lv
Yongwei Zhou
Yu Yang
Yuchen Xie
Yueqing Sun
Yuewen Zheng
Yuhua Wei
Yulei Qian
Yunfan Liang
Yunfang Tai
Yunke Zhao
Zeyang Yu
Zhao Zhang
Zhaohua Yang
Zhenchao Zhang
Zhikang Xia
Zhiye Zou
Zhizhao Zeng
Zhongda Su
Zhuofan Chen
Zijian Zhang
Ziwen Wang
Zixu Jiang
Zizhe Zhao
Zongyu Wang
Zunhai Su
et al. (82 additional authors not shown)
Main:28 Pages
14 Figures
Bibliography:6 Pages
9 Tables
Appendix:2 Pages
Abstract

We introduce LongCat-Flash, a 560-billion-parameter Mixture-of-Experts (MoE) language model designed for both computational efficiency and advanced agentic capabilities. Stemming from the need for scalable efficiency, LongCat-Flash adopts two novel designs: (a) Zero-computation Experts, which enables dynamic computational budget allocation and activates 18.6B-31.3B (27B on average) per token depending on contextual demands, optimizing resource usage. (b) Shortcut-connected MoE, which enlarges the computation-communication overlap window, demonstrating notable gains in inference efficiency and throughput compared to models of a comparable scale. We develop a comprehensive scaling framework for large models that combines hyperparameter transfer, model-growth initialization, a multi-pronged stability suite, and deterministic computation to achieve stable and reproducible training. Notably, leveraging the synergy among scalable architectural design and infrastructure efforts, we complete model training on more than 20 trillion tokens within 30 days, while achieving over 100 tokens per second (TPS) for inference at a cost of \0.70permillionoutputtokens.TocultivateLongCatFlashtowardsagenticintelligence,weconductalargescalepretrainingonoptimizedmixtures,followedbytargetedmidandposttrainingonreasoning,code,andinstructions,withfurtheraugmentationfromsyntheticdataandtoolusetasks.Comprehensiveevaluationsdemonstratethat,asanonthinkingfoundationmodel,LongCatFlashdelivershighlycompetitiveperformanceamongotherleadingmodels,withexceptionalstrengthsinagentictasks.ThemodelcheckpointofLongCatFlashisopensourcedtofostercommunityresearch.0.70 per million output tokens. To cultivate LongCat-Flash towards agentic intelligence, we conduct a large-scale pre-training on optimized mixtures, followed by targeted mid- and post-training on reasoning, code, and instructions, with further augmentation from synthetic data and tool use tasks. Comprehensive evaluations demonstrate that, as a non-thinking foundation model, LongCat-Flash delivers highly competitive performance among other leading models, with exceptional strengths in agentic tasks. The model checkpoint of LongCat-Flash is open-sourced to foster community research.

View on arXiv
Comments on this paper