ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.02954
139
316

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

5 January 2024
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
Shanhuang Chen
Damai Dai
Chengqi Deng
Honghui Ding
Kai Dong
Qiushi Du
Zhe Fu
Huazuo Gao
Kaige Gao
W. Gao
Ruiqi Ge
Kang Guan
Daya Guo
Jianzhong Guo
Guangbo Hao
Zhewen Hao
Ying He
Wen-Hui Hu
Panpan Huang
Erhang Li
Guowei Li
Jiashi Li
Yao Li
Y. K. Li
W. Liang
Fangyun Lin
A. Liu
Bo Liu
Wen Liu
Xiaodong Liu
Xin Liu
Yiyuan Liu
Haoyu Lu
Shanghao Lu
Fuli Luo
Shirong Ma
Xiaotao Nie
Tian Pei
Yishi Piao
Junjie Qiu
Hui Qu
Tongzheng Ren
Zehui Ren
Chong Ruan
Zhangli Sha
Zhihong Shao
Jun-Mei Song
Xuecheng Su
Jingxiang Sun
Yaofeng Sun
Min Tang
Bing-Li Wang
Peiyi Wang
Shiyu Wang
Yaohui Wang
Yongji Wang
Tong Wu
Yu-Huan Wu
Xin Xie
Zhenda Xie
Ziwei Xie
Yi Xiong
Hanwei Xu
R. X. Xu
Yanhong Xu
Dejian Yang
Yu-mei You
Shuiping Yu
Xin-yuan Yu
Bo Zhang
Haowei Zhang
Lecong Zhang
Liyue Zhang
Mingchuan Zhang
Minghu Zhang
Wentao Zhang
Yichao Zhang
Chenggang Zhao
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
    LRM
    ALM
ArXivPDFHTML
Abstract

The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5.

View on arXiv
Comments on this paper