360-LLaMA-Factory: Plug & Play Sequence Parallelism for Long Post-Training

28 May 2025

Abstract

Adding sequence parallelism into LLaMA-Factory, we open-sourced 360-LLaMA-Factory atthis https URL. 360-LLaMA-Factory has received wide recognition and used in models such as Light-R1arXiv:2503.10460, TinyR1arXiv:2503.04872, Kaggle AIMO math models and also in large companies' training frameworks. This technical report delves deeper into the different sequence parallel modes behind 360-LLaMA-Factory and discusses our implementation insights.

View on arXiv

@article{zou2025_2505.22296,
  title={ 360-LLaMA-Factory: Plug & Play Sequence Parallelism for Long Post-Training },
  author={ Haosheng Zou and Xiaowei Lv and Shousheng Jia and Xiangzheng Zhang },
  journal={arXiv preprint arXiv:2505.22296},
  year={ 2025 }
}

Comments on this paper