ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.04434
  4. Cited By
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
  Language Model

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

7 May 2024
DeepSeek-AI
Aixin Liu
Bei Feng
Bin Wang
Bingxuan Wang
Bo Liu
Chenggang Zhao
Chengqi Dengr
Chong Ruan
Damai Dai
Daya Guo
Dejian Yang
Deli Chen
Dongjie Ji
Erhang Li
Fangyun Lin
Fuli Luo
Guangbo Hao
Guanting Chen
Guowei Li
Hai-Tao Zhang
Hanwei Xu
Hao Yang
Haowei Zhang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Li
Hui Qu
Jianfeng Cai
Jian Liang
Jianzhong Guo
Jiaqi Ni
Jiashi Li
Jin Chen
Jingyang Yuan
Junjie Qiu
Junxiao Song
Kai Dong
Kaige Gao
Kang Guan
Lean Wang
Lecong Zhang
Lei Xu
Leyi Xia
Liang Zhao
Liyue Zhang
Meng Li
Miaojun Wang
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Mingming Li
Ning Tian
Panpan Huang
Peiyi Wang
Peng Zhang
Qihao Zhu
Qinyu Chen
Qiushi Du
Renqi Chen
Rong Jin
Ruiqi Ge
Ruizhe Pan
Runxin Xu
Ruyi Chen
S. S. Li
Shanghao Lu
Shangyan Zhou
Shanhuang Chen
Shaoqing Wu
Shengfeng Ye
Shirong Ma
Shiyu Wang
Shuang Zhou
Shuiping Yu
Shunfeng Zhou
Wenlei Bao
Tao Wang
Tian Pei
Tian Yuan
Tianyu Sun
W. L. Xiao
Wangding Zeng
Wei An
Wen Liu
Wenfeng Liang
Wenjun Gao
Wentao Zhang
X. Q. Li
Xiangyue Jin
Xianzu Wang
Xiao Bi
Xiaodong Liu
Xiaohan Wang
Xiaojin Shen
Xiaokang Chen
Xiaosha Chen
Xiaotao Nie
Xiaowen Sun
Xiaoxiang Wang
Xin Liu
Xin Xie
Xingkai Yu
Xinnan Song
Xinyi Zhou
Xinyu Yang
Xuan Lu
Xuecheng Su
Ying Wu
Y. K. Li
Y. X. Wei
Bo Li
Yanhong Xu
Yanping Huang
Yao Li
Yao-Min Zhao
Yaofeng Sun
Yaohui Li
Yaohui Wang
Yi Zheng
Yichao Zhang
Yiliang Xiong
Yilong Zhao
Ying He
Ying Tang
Yishi Piao
Yixin Dong
Yixuan Tan
Yiyuan Liu
Yongji Wang
Yongqiang Guo
Yuchen Zhu
Yuduan Wang
Yuheng Zou
Yukun Zha
Yunxian Ma
Yuting Yan
Yuxiang You
Yuxuan Liu
Z. Z. Ren
Zehui Ren
Zhangli Sha
Zhe Fu
Zhen Huang
Zhen Zhang
Zhenda Xie
Zhewen Hao
Zhihong Shao
Zhiniu Wen
Zhipeng Xu
Zhongyu Zhang
Zhuoshu Li
Zihan Wang
Zihui Gu
Zilin Li
Ziwei Xie
    MoE
ArXivPDFHTML

Papers citing "DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model"

50 / 108 papers shown
Title
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
Yinsicheng Jiang
Yao Fu
Yeqi Huang
Ping Nie
Zhan Lu
...
Dayou Du
Tairan Xu
Kai Zou
Edoardo Ponti
Luo Mai
MoE
17
0
0
16 May 2025
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
C. Jin
Ziheng Jiang
Zhihao Bai
Zheng Zhong
Jing Liu
...
Yanghua Peng
Xuanzhe Liu
Xuanzhe Liu
Xin Jin
Xin Liu
MoE
12
0
0
16 May 2025
Can You Really Trust Code Copilots? Evaluating Large Language Models from a Code Security Perspective
Can You Really Trust Code Copilots? Evaluating Large Language Models from a Code Security Perspective
Yutao Mou
Xiao Deng
Yuxiao Luo
Shikun Zhang
Wei Ye
ELM
21
0
0
15 May 2025
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Chenggang Zhao
Chengqi Deng
Chong Ruan
Damai Dai
Huazuo Gao
...
Wenfeng Liang
Ying He
Yishuo Wang
Yuxuan Liu
Y. X. Wei
MoE
41
0
0
14 May 2025
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
Shengpeng Ji
Tianle Liang
Yong Li
Jialong Zuo
Minghui Fang
...
Xize Cheng
Siqi Zheng
Jin Xu
Junyang Lin
Zhou Zhao
AuLLM
ALM
33
0
0
14 May 2025
Large Language Models for Computer-Aided Design: A Survey
Large Language Models for Computer-Aided Design: A Survey
Licheng Zhang
Bach Le
Naveed Akhtar
Siew-Kei Lam
Tuan Ngo
3DV
AI4CE
40
0
0
13 May 2025
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Haojie Duanmu
Xiuhong Li
Zhihang Yuan
Size Zheng
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
MoE
212
0
0
09 May 2025
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou
Zheng Li
J. Zhang
Jue Wang
Yanjie Wang
Zhongle Xie
Ke Chen
Lidan Shou
MoE
57
0
0
09 May 2025
Faster MoE LLM Inference for Extremely Large Models
Faster MoE LLM Inference for Extremely Large Models
Haoqi Yang
Luohe Shi
Qiwei Li
Zuchao Li
Ping Wang
Bo Du
Mengjia Shen
Hai Zhao
MoE
68
0
0
06 May 2025
Beyond the model: Key differentiators in large language models and multi-agent services
Beyond the model: Key differentiators in large language models and multi-agent services
Muskaan Goyal
Pranav Bhasin
LLMAG
ELM
206
0
0
05 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELM
LRM
60
2
0
04 May 2025
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
Xing Hu
Zhixuan Chen
Dawei Yang
Zukang Xu
Chen Xu
Zhihang Yuan
Sifan Zhou
Jiangyong Yu
MoE
MQ
44
0
0
02 May 2025
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
Kaipeng Zhang
Lizhuang Ma
Yufei Guo
Jun Wang
Wenbo Zhang
MQ
57
0
0
01 May 2025
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Yuanchen Wu
Lu Zhang
Hang Yao
Junlong Du
Ke Yan
Shouhong Ding
Yunsheng Wu
Xuzhao Li
MLLM
71
0
0
29 Apr 2025
X-Fusion: Introducing New Modality to Frozen Large Language Models
X-Fusion: Introducing New Modality to Frozen Large Language Models
Sicheng Mo
Thao Nguyen
Xun Huang
Siddharth Srinivasan Iyer
Yijun Li
...
Eli Shechtman
Krishna Kumar Singh
Yong Jae Lee
Bolei Zhou
Yuheng Li
77
0
0
29 Apr 2025
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core
Dennis Liu
Zijie Yan
Xin Yao
Tong Liu
V. Korthikanti
...
Jiajie Yao
Chandler Zhou
David Wu
Xipeng Li
J. Yang
MoE
70
0
0
21 Apr 2025
Efficient Pretraining Length Scaling
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
174
0
0
21 Apr 2025
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Ashwinee Panda
Vatsal Baherwani
Zain Sarwar
Benjamin Thérien
Supriyo Chakraborty
Tom Goldstein
MoE
42
0
0
16 Apr 2025
FuseRL: Dense Preference Optimization for Heterogeneous Model Fusion
FuseRL: Dense Preference Optimization for Heterogeneous Model Fusion
Longguang Zhong
Fanqi Wan
Ziyi Yang
Guosheng Liang
Tianyuan Shi
Xiaojun Quan
MoMe
57
0
0
09 Apr 2025
Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts
Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts
Yifei Yu
Qian Zhang
Lingfeng Qiao
Di Yin
Fang Li
Jie Wang
Z. Chen
Suncong Zheng
Xiaolong Liang
Xingchen Sun
44
0
0
07 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
Ran Xu
Shirong Ma
Chong Ruan
Peng Li
Yang Liu
Y. Wu
OffRL
LRM
46
13
0
03 Apr 2025
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
Xiaohui Sun
Ruitong Xiao
Jianye Mo
Bowen Wu
Qun Yu
Baoxun Wang
51
1
0
03 Apr 2025
Cognitive Memory in Large Language Models
Cognitive Memory in Large Language Models
Lianlei Shan
Shixian Luo
Zezhou Zhu
Yu Yuan
Yong Wu
LLMAG
KELM
199
1
0
03 Apr 2025
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
Mohan Zhang
Pingzhi Li
Jie Peng
Mufan Qiu
Tianlong Chen
MoE
50
0
0
02 Apr 2025
Rethinking industrial artificial intelligence: a unified foundation framework
Rethinking industrial artificial intelligence: a unified foundation framework
Jay Lee
Hanqi Su
AI4CE
41
1
0
02 Apr 2025
WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference
WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference
Youhui Zuo
Sibo Wei
C. Zhang
Zhuorui Liu
Wenpeng Lu
Dawei Song
VLM
61
0
0
23 Mar 2025
TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification
TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification
Junnan Zhu
Min Xiao
Yining Wang
Feifei Zhai
Yu Zhou
Chengqing Zong
80
0
0
19 Mar 2025
Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models
Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models
Siwei Zhang
Yun Xiong
Yateng Tang
Xi Chen
Zian Jia
Zehao Gu
Jiarong Xu
Jiawei Zhang
AI4CE
75
1
0
18 Mar 2025
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
Tianle Li
Yongming Rao
Winston Hu
Yu Cheng
MLLM
68
0
0
16 Mar 2025
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Cheng Deng
Luoyang Sun
Jiwen Jiang
Yongcheng Zeng
Xinjian Wu
...
Haoyang Li
Lei Chen
Lionel M. Ni
Jun Wang
Jun Wang
189
0
0
15 Mar 2025
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Neusha Javidnia
B. Rouhani
F. Koushanfar
187
0
0
14 Mar 2025
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
Guihong Li
Mehdi Rezagholizadeh
Mingyu Yang
Vikram Appia
Emad Barsoum
VLM
60
0
0
14 Mar 2025
Cost-Optimal Grouped-Query Attention for Long-Context Modeling
Cost-Optimal Grouped-Query Attention for Long-Context Modeling
Yuxiao Chen
Yutong Wu
Chenyang Song
Zhiyuan Liu
Maosong Sun
Xu Han
Zhiyuan Liu
Maosong Sun
73
0
0
12 Mar 2025
Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges
Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges
Xiaoxiao Liu
Qingying Xiao
Junying Chen
Xiangyi Feng
Xiangbo Wu
...
Xiang Wan
Jian Chang
Guangjun Yu
Yan Hu
Benyou Wang
LM&MA
LRM
212
0
0
11 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu-Xi Cheng
MoE
103
2
0
07 Mar 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
Chenyu Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoE
ALM
111
2
0
07 Mar 2025
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling
Yan Li
Pengfei Zheng
Shuang Chen
Zewei Xu
Yuanhao Lai
Yunfei Du
Zehao Wang
MoE
181
0
0
06 Mar 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
51
0
0
06 Mar 2025
PanguIR Technical Report for NTCIR-18 AEOLLM Task
Lang Mei
Chong Chen
Jiaxin Mao
ALM
57
1
0
04 Mar 2025
FANformer: Improving Large Language Models Through Effective Periodicity Modeling
FANformer: Improving Large Language Models Through Effective Periodicity Modeling
Yihong Dong
Ge Li
Xue Jiang
Yongding Tao
Kechi Zhang
...
Huanyu Liu
Jiazheng Ding
Jia Li
Jinliang Deng
Hong Mei
AI4TS
46
0
0
28 Feb 2025
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
Peijie Wang
Zhong-Zhi Li
Fei Yin
Dekang Ran
Dekang Ran
Cheng-Lin Liu
LRM
50
3
0
28 Feb 2025
A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation
A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation
Xujie Yuan
Y. Liu
Shimin Di
Shiwen Wu
Libin Zheng
Rui Meng
Lei Chen
Xiaofang Zhou
Jian Yin
36
0
0
28 Feb 2025
END: Early Noise Dropping for Efficient and Effective Context Denoising
END: Early Noise Dropping for Efficient and Effective Context Denoising
Hongye Jin
Pei Chen
Jingfeng Yang
Zhaoxiang Wang
Meng Jiang
...
Jiahui Geng
Zheng Li
Tianyi Liu
Huasheng Li
Bing Yin
170
0
0
26 Feb 2025
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Siyuan Zhang
Y. Zhang
Yinpeng Dong
Hang Su
HILM
KELM
233
0
0
26 Feb 2025
Kanana: Compute-efficient Bilingual Language Models
Kanana: Compute-efficient Bilingual Language Models
Kanana LLM Team
Yunju Bak
Hojin Lee
Minho Ryu
Jiyeon Ham
...
Daniel Lee
Minchul Lee
M. Lee
Shinbok Lee
Gaeun Seo
98
1
0
26 Feb 2025
UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering
UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering
L. Liu
Shilei Liu
Yujin Yuan
Wenjie Qu
Bencheng Yan
...
Di Wang
Wenbo Su
Pengjie Wang
Jian Xu
Bo Zheng
64
1
0
26 Feb 2025
Harnessing Multiple Large Language Models: A Survey on LLM Ensemble
Harnessing Multiple Large Language Models: A Survey on LLM Ensemble
Zhijun Chen
Jingzheng Li
Pengpeng Chen
Zhuoran Li
Kai Sun
Yuankai Luo
Qianren Mao
Dingqi Yang
Hailong Sun
Philip S. Yu
ELM
55
5
0
25 Feb 2025
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models
Raeid Saqur
Anastasis Kratsios
Florian Krach
Yannick Limmer
Jacob-Junqi Tian
John Willes
Blanka Horvath
Frank Rudzicz
MoE
53
0
0
24 Feb 2025
Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference
Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference
Thanh Le-Cong
Bach Le
Toby Murray
LRM
55
1
0
22 Feb 2025
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
C. Xie
Shuo Cai
Wenjun Wang
Pengxiang Li
Zhijie Sang
...
Xiaotian Han
Jianbo Yuan
Shengyu Zhang
Fei Wu
Hongxia Yang
LRM
51
1
0
17 Feb 2025
123
Next