ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.02954
  4. Cited By
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

5 January 2024
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
Shanhuang Chen
Damai Dai
Chengqi Deng
Honghui Ding
Kai Dong
Qiushi Du
Zhe Fu
Huazuo Gao
Kaige Gao
W. Gao
Ruiqi Ge
Kang Guan
Daya Guo
Jianzhong Guo
Guangbo Hao
Zhewen Hao
Ying He
Wen-Hui Hu
Panpan Huang
Erhang Li
Guowei Li
Jiashi Li
Yao Li
Yiming Li
W. Liang
Fangyun Lin
A. Liu
Bo Liu
Wen Liu
Xiaodong Liu
Xin Liu
Yiyuan Liu
Haoyu Lu
Shanghao Lu
Fuli Luo
Shirong Ma
Xiaotao Nie
Tian Pei
Yishi Piao
Junjie Qiu
Hui Qu
Tongzheng Ren
Zehui Ren
Chong Ruan
Zhangli Sha
Zhihong Shao
Jun-Mei Song
Xuecheng Su
Jingxiang Sun
Yaofeng Sun
Min Tang
Bing-Li Wang
Peiyi Wang
Shiyu Wang
Yaohui Wang
Yongji Wang
Tong Wu
Yu-Huan Wu
Xin Xie
Zhenda Xie
Ziwei Xie
Yi Xiong
Hanwei Xu
R. X. Xu
Yanhong Xu
Dejian Yang
Yu-mei You
Shuiping Yu
Xin-yuan Yu
Bo Zhang
Haowei Zhang
Lecong Zhang
Liyue Zhang
Mingchuan Zhang
Minghu Zhang
Wentao Zhang
Yichao Zhang
Chenggang Zhao
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
    LRM
    ALM
ArXivPDFHTML

Papers citing "DeepSeek LLM: Scaling Open-Source Language Models with Longtermism"

50 / 89 papers shown
Title
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
27
0
0
19 May 2025
Shadow-FT: Tuning Instruct via Base
Shadow-FT: Tuning Instruct via Base
Taiqiang Wu
Runming Yang
Jiayi Li
Pengfei Hu
Ngai Wong
Yujiu Yang
24
0
0
19 May 2025
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Zemin Huang
Zhiyang Chen
Zijun Wang
Tiancheng Li
Guo-Jun Qi
DiffM
LRM
AI4CE
30
0
0
15 May 2025
Parallel Scaling Law for Language Models
Parallel Scaling Law for Language Models
Mouxiang Chen
Binyuan Hui
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
Jianling Sun
Junyang Lin
Zhongxin Liu
MoE
LRM
42
0
0
15 May 2025
Improved Algorithms for Differentially Private Language Model Alignment
Improved Algorithms for Differentially Private Language Model Alignment
Keyu Chen
Hao Tang
Qinglin Liu
Yizhao Xu
36
0
0
13 May 2025
BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models
BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models
Zhendong Wang
Hongwei Li
Rui Zhang
Wenbo Jiang
Kangjie Chen
Tianwei Zhang
Qingchuan Zhao
Guowen Xu
AAML
46
0
0
06 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELM
LRM
62
2
0
04 May 2025
Don't be lazy: CompleteP enables compute-efficient deep transformers
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Li
Blake Bordelon
Shane Bergsma
Cengiz Pehlevan
Boris Hanin
Joel Hestness
44
1
0
02 May 2025
Beyond Attention: Toward Machines with Intrinsic Higher Mental States
Beyond Attention: Toward Machines with Intrinsic Higher Mental States
Ahsan Adeel
OffRL
LRM
39
0
0
02 May 2025
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
Xinyue Zeng
Haohui Wang
Junhong Lin
Jun Wu
Tyler Cody
Dawei Zhou
202
0
0
01 May 2025
CoordField: Coordination Field for Agentic UAV Task Allocation In Low-altitude Urban Scenarios
CoordField: Coordination Field for Agentic UAV Task Allocation In Low-altitude Urban Scenarios
Tengchao Zhang
Yonglin Tian
Fei Lin
Jun Huang
Patrik P. Süli
Rui Qin
Fei-Yue Wang
73
0
0
30 Apr 2025
X-Fusion: Introducing New Modality to Frozen Large Language Models
X-Fusion: Introducing New Modality to Frozen Large Language Models
Sicheng Mo
Thao Nguyen
Xun Huang
Siddharth Srinivasan Iyer
Yijun Li
...
Eli Shechtman
Krishna Kumar Singh
Yong Jae Lee
Bolei Zhou
Yuheng Li
79
0
0
29 Apr 2025
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
Jiageng Wu
Bowen Gu
Ren Zhou
Kevin Xie
Doug Snyder
...
Shane Segal
Jonathan H. Chen
Santiago Romero-Brufau
K. J. Lin
Jie Yang
LM&MA
ELM
103
0
0
28 Apr 2025
PhenoAssistant: A Conversational Multi-Agent AI System for Automated Plant Phenotyping
PhenoAssistant: A Conversational Multi-Agent AI System for Automated Plant Phenotyping
Feng Chen
Ilias Stogiannidis
Andrew Wood
Danilo Bueno
Dominic Williams
...
Stephen A. Rolfe
Tracy Lawson
Tony Pridmore
M. Giuffrida
Sotirios A. Tsaftaris
62
0
0
28 Apr 2025
Generative AI in Education: Student Skills and Lecturer Roles
Generative AI in Education: Student Skills and Lecturer Roles
Stefanie Krause
Ashish Dalvi
Syed Khubaib Zaidi
264
0
0
28 Apr 2025
Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models
Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models
Anindya Bijoy Das
Shibbir Ahmed
Shahnewaz Karim Sakib
HILM
LM&MA
57
0
0
27 Apr 2025
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
Junshu Pan
Wei Shen
Shulin Huang
Qiji Zhou
Yue Zhang
74
0
0
22 Apr 2025
Trillion 7B Technical Report
Trillion 7B Technical Report
Sungjun Han
Juyoung Suk
Suyeong An
Hyungguk Kim
Kyuseok Kim
Wonsuk Yang
Seungtaek Choi
Jamin Shin
225
1
0
21 Apr 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
62
0
0
20 Apr 2025
An LLM Framework For Cryptography Over Chat Channels
An LLM Framework For Cryptography Over Chat Channels
D. Gligoroski
Mayank Raikwar
Sonu Kumar Jha
50
0
0
11 Apr 2025
How Well Can Vison-Language Models Understand Humans' Intention? An Open-ended Theory of Mind Question Evaluation Benchmark
How Well Can Vison-Language Models Understand Humans' Intention? An Open-ended Theory of Mind Question Evaluation Benchmark
Ximing Wen
Mallika Mainali
Anik Sen
47
0
0
28 Mar 2025
Sun-Shine: A Large Language Model for Tibetan Culture
Sun-Shine: A Large Language Model for Tibetan Culture
Cheng Huang
Fan Gao
Nyima Tashi
Yutong Liu
Xiangxiang Wang
...
Gadeng Luosang
Rinchen Dongrub
Dorje Tashi
Xiao Feng
Yongbin Yu
ALM
106
2
0
24 Mar 2025
Collaborative Speculative Inference for Efficient LLM Inference Serving
Luyao Gao
Jianchun Liu
Hongli Xu
Xichong Zhang
Yunming Liao
Liusheng Huang
48
0
0
13 Mar 2025
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
Yuhang Liu
Dong Gong
Erdun Gao
Zhen Zhang
Zhen Zhang
Biwei Huang
Anton van den Hengel
Javen Qinfeng Shi
Javen Qinfeng Shi
259
0
0
12 Mar 2025
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen
Xufang Luo
Dongsheng Li
OffRL
LRM
75
3
0
10 Mar 2025
Unity RL Playground: A Versatile Reinforcement Learning Framework for Mobile Robots
Linqi Ye
Rankun Li
Xiaowen Hu
Jiayi Li
Boyang Xing
Yan Peng
Bin Liang
69
0
0
07 Mar 2025
Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Kashun Shum
Yuanmin Huang
Hongjian Zou
Qi Ding
Yixuan Liao
Xiao Chen
Qian Liu
Junxian He
67
2
0
02 Mar 2025
EdgeAIGuard: Agentic LLMs for Minor Protection in Digital Spaces
G. Mujtaba
Sunder Ali Khowaja
K. Dev
47
0
0
28 Feb 2025
Kanana: Compute-efficient Bilingual Language Models
Kanana: Compute-efficient Bilingual Language Models
Kanana LLM Team
Yunju Bak
Hojin Lee
Minho Ryu
Jiyeon Ham
...
Daniel Lee
Minchul Lee
MinHyung Lee
Shinbok Lee
Gaeun Seo
98
1
0
26 Feb 2025
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Yancheng He
Shilong Li
Jing Liu
Weixun Wang
Xingyuan Bu
...
Zhongyuan Peng
Zhenru Zhang
Zhicheng Zheng
Wenbo Su
Bo Zheng
ELM
LRM
86
9
0
26 Feb 2025
Large Language Diffusion Models
Large Language Diffusion Models
Shen Nie
Fengqi Zhu
Zebin You
Xiaolu Zhang
Jingyang Ou
Jun Hu
Jun Zhou
Yankai Lin
Zhicheng Dou
Chongxuan Li
114
22
0
14 Feb 2025
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
Qingshui Gu
Shu Li
Tianyu Zheng
Zhaoxiang Zhang
307
0
0
10 Feb 2025
The Curse of Depth in Large Language Models
The Curse of Depth in Large Language Models
Wenfang Sun
Xinyuan Song
Pengxiang Li
Lu Yin
Yefeng Zheng
Shiwei Liu
80
5
0
09 Feb 2025
Position: AI Scaling: From Up to Down and Out
Position: AI Scaling: From Up to Down and Out
Yunke Wang
Yanxi Li
Chang Xu
HAI
99
2
0
02 Feb 2025
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking
Yuchun Miao
Sen Zhang
Liang Ding
Yuqi Zhang
Lefei Zhang
Dacheng Tao
96
4
0
31 Jan 2025
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Samira Abnar
Harshay Shah
Dan Busbridge
Alaaeldin Mohamed Elnouby Ali
J. Susskind
Vimal Thilak
MoE
LRM
52
5
0
28 Jan 2025
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
Zibo Zhao
Zeqiang Lai
Qingxiang Lin
Yunfei Zhao
Haolin Liu
...
Jingwei Huang
Chunchao Guo
Jie Jiang
Jingwei Huang
Chunchao Guo
126
25
0
21 Jan 2025
Aligning Instruction Tuning with Pre-training
Aligning Instruction Tuning with Pre-training
Yiming Liang
Tianyu Zheng
Xinrun Du
Ge Zhang
Jiaheng Liu
...
Zhaoxiang Zhang
Wenhao Huang
Jiajun Zhang
Xiang Yue
Jiajun Zhang
96
1
0
16 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
49
4
0
10 Jan 2025
Towards Precise Scaling Laws for Video Diffusion Transformers
Towards Precise Scaling Laws for Video Diffusion Transformers
Yuanyang Yin
Yaqi Zhao
Mingwu Zheng
Ke Lin
Jiarong Ou
...
Pengfei Wan
Di Zhang
Baoqun Yin
Wentao Zhang
Kun Gai
130
3
0
03 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
104
48
0
03 Jan 2025
Next Patch Prediction for Autoregressive Visual Generation
Next Patch Prediction for Autoregressive Visual Generation
Yatian Pang
Peng Jin
Shuo Yang
Bin Lin
Bin Zhu
...
Liuhan Chen
Francis E. H. Tay
Ser-Nam Lim
Harry Yang
Li Yuan
142
9
0
19 Dec 2024
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression
Junjie Wen
Minjie Zhu
Bo Li
Zhibin Tang
Jinming Li
...
Chengmeng Li
Xiaoyu Liu
Chaomin Shen
Yaxin Peng
Feifei Feng
101
16
0
04 Dec 2024
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM
Jingwei Xu
Chenyu Wang
Zibo Zhao
Wen Liu
Yi Ma
Shenghua Gao
58
13
0
07 Nov 2024
SQL Injection Jailbreak: A Structural Disaster of Large Language Models
SQL Injection Jailbreak: A Structural Disaster of Large Language Models
Jiawei Zhao
Kejiang Chen
Wentao Zhang
Nenghai Yu
AAML
45
0
0
03 Nov 2024
How Does Critical Batch Size Scale in Pre-training?
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
88
11
0
29 Oct 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Liwen Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
52
3
0
24 Oct 2024
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models
Han Zhang
Hongfu Gao
Qiang Hu
Guanhua Chen
L. Yang
Bingyi Jing
Hongxin Wei
Bing Wang
Haifeng Bai
Lei Yang
AILaw
ELM
52
2
0
24 Oct 2024
Compute-Constrained Data Selection
Compute-Constrained Data Selection
Junjie Oscar Yin
Alexander M. Rush
42
0
0
21 Oct 2024
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large
  Language Models
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models
Qitan Lv
Jie Wang
Hanzhu Chen
Bin Li
Yongdong Zhang
Feng Wu
HILM
35
3
0
19 Oct 2024
12
Next