Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.04434
Cited By
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
7 May 2024
DeepSeek-AI
Aixin Liu
Bei Feng
Bin Wang
Bingxuan Wang
Bo Liu
Chenggang Zhao
Chengqi Dengr
Chong Ruan
Damai Dai
Daya Guo
Dejian Yang
Deli Chen
Dongjie Ji
Erhang Li
Fangyun Lin
Fuli Luo
Guangbo Hao
Guanting Chen
Guowei Li
Hai-Tao Zhang
Hanwei Xu
Hao Yang
Haowei Zhang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Li
Hui Qu
Jianfeng Cai
Jian Liang
Jianzhong Guo
Jiaqi Ni
Jiashi Li
Jin Chen
Jingyang Yuan
Junjie Qiu
Junxiao Song
Kai Dong
Kaige Gao
Kang Guan
Lean Wang
Lecong Zhang
Lei Xu
Leyi Xia
Liang Zhao
Liyue Zhang
Meng Li
Miaojun Wang
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Mingming Li
Ning Tian
Panpan Huang
Peiyi Wang
Peng Zhang
Qihao Zhu
Qinyu Chen
Qiushi Du
Ruoxin Chen
Rong Jin
Ruiqi Ge
Ruizhe Pan
Runxin Xu
Ruyi Chen
S. S. Li
Shanghao Lu
Shangyan Zhou
Shanhuang Chen
Shaoqing Wu
Shengfeng Ye
Shirong Ma
Shiyu Wang
Shuang Zhou
Shuiping Yu
Shunfeng Zhou
Wenlei Bao
Tao Wang
Tian Pei
Tian Yuan
Tianyu Sun
W. L. Xiao
Wangding Zeng
Wei An
Wen Liu
Wenfeng Liang
Wenjun Gao
Wentao Zhang
X. Q. Li
Xiangyue Jin
Xianzu Wang
Xiao Bi
Xiaodong Liu
Xiaohan Wang
Xiaojin Shen
Xiaokang Chen
Xiaosha Chen
Xiaotao Nie
Xiaowen Sun
Xiaoxiang Wang
Xin Liu
Xin Xie
Xingkai Yu
Xinnan Song
Xinyi Zhou
Xinyu Yang
Xuan Lu
Xuecheng Su
Ying Wu
Y. K. Li
Y. X. Wei
Yichen Zhu
Yanhong Xu
Yanping Huang
Yao Li
Yao-Min Zhao
Yaofeng Sun
Yaohui Li
Yaohui Wang
Yi Zheng
Yichao Zhang
Yiliang Xiong
Yilong Zhao
Ying He
Ying Tang
Yishi Piao
Yixin Dong
Yixuan Tan
Yiyuan Liu
Yongji Wang
Yongqiang Guo
Yuchen Zhu
Yuduan Wang
Yuheng Zou
Yukun Zha
Yunxian Ma
Yuting Yan
Yuxiang You
Yuxuan Liu
Z. Z. Ren
Zehui Ren
Zhangli Sha
Zhe Fu
Zhen Huang
Zhen Zhang
Zhenda Xie
Zhewen Hao
Zhihong Shao
Zhiniu Wen
Zhipeng Xu
Zhongyu Zhang
Zhuoshu Li
Zihan Wang
Zihui Gu
Zilin Li
Ziwei Xie
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model"
50 / 108 papers shown
Title
An Efficient Large Recommendation Model: Towards a Resource-Optimal Scaling Law
Songpei Xu
Shijia Wang
Da Guo
Xianwen Guo
Qiang Xiao
Fangjian Li
Chuanjiang Luo
80
0
0
17 Feb 2025
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarcity
Dylan Zhang
Justin Wang
Tianran Sun
56
1
0
17 Feb 2025
Leveraging Constrained Monte Carlo Tree Search to Generate Reliable Long Chain-of-Thought for Mathematical Reasoning
Qingwen Lin
Boyan Xu
Zijian Li
Zhifeng Hao
Keli Zhang
Ruichu Cai
LRM
52
2
0
16 Feb 2025
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
MoE
AI4CE
66
1
0
13 Feb 2025
Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline
Zhiyuan Fang
Yuegui Huang
Zicong Hong
Yufeng Lyu
Wuhui Chen
Yue Yu
Fan Yu
Zibin Zheng
MoE
48
0
0
09 Feb 2025
ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data
Xiaoyang Liu
Kangjie Bao
Jiashuo Zhang
Yunqi Liu
Yu Chen
Yu Chen
Yang Jiao
Tao Luo
AIMat
55
0
0
08 Feb 2025
Importance Sampling via Score-based Generative Models
Heasung Kim
Taekyun Lee
Hyeji Kim
Gustavo de Veciana
MedIm
DiffM
141
0
0
07 Feb 2025
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Xiang Liu
Zhenheng Tang
Hong Chen
Peijie Dong
Zeyu Li
Xiuze Zhou
Bo Li
Xuming Hu
Xiaowen Chu
215
3
0
04 Feb 2025
Position: AI Scaling: From Up to Down and Out
Yunke Wang
Yanxi Li
Chang Xu
HAI
88
2
0
02 Feb 2025
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
Xin Xu
Qiyun Xu
Tong Xiao
Tianhao Chen
Yuchen Yan
Jiaxin Zhang
Shizhe Diao
Can Yang
Yang Wang
ELM
LRM
AI4CE
111
3
0
01 Feb 2025
StringLLM: Understanding the String Processing Capability of Large Language Models
Xilong Wang
Hao Fu
Jindong Wang
Neil Zhenqiang Gong
67
0
0
28 Jan 2025
Panoramic Interests: Stylistic-Content Aware Personalized Headline Generation
Junhong Lian
Xiang Ao
Xinyu Liu
Yang Liu
Qing He
36
0
0
21 Jan 2025
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements
Xueyan Li
Xinyan Chen
Yazhe Niu
Shuai Hu
Yu Liu
OffRL
65
3
0
17 Jan 2025
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
90
9
0
11 Jan 2025
Powerful Design of Small Vision Transformer on CIFAR10
Gent Wu
ViT
44
0
0
07 Jan 2025
Scaling Laws for Floating Point Quantization Training
Xingchen Sun
Shuaipeng Li
Ruobing Xie
Weidong Han
Kan Wu
...
Yangyu Tao
Zhanhui Kang
C. Xu
Di Wang
Jie Jiang
MQ
AIFin
62
0
0
05 Jan 2025
Hengqin-RA-v1: Advanced Large Language Model for Diagnosis and Treatment of Rheumatoid Arthritis with Dataset based Traditional Chinese Medicine
Yishen Liu
Shengda Luo
Zishao Zhong
Tongtong Wu
Junge Zhang
Peiyao Ou
Yong Liang
Liang Liu
Hudan Pan
LM&MA
43
0
0
05 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALM
LRM
95
224
0
03 Jan 2025
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation
Zhaojian Yu
Yilun Zhao
Arman Cohan
Xiao-Ping Zhang
LRM
36
4
0
03 Jan 2025
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
Shanghaoran Quan
Jiaxi Yang
Bowen Yu
Jian Xu
Dayiheng Liu
...
Zeyu Cui
Yang Fan
Wenjie Qu
Binyuan Hui
Junyang Lin
ALM
ELM
LRM
74
16
0
02 Jan 2025
GPT or BERT: why not both?
Lucas Georges Gabriel Charpentier
David Samuel
55
5
0
31 Dec 2024
BaiJia: A Large-Scale Role-Playing Agent Corpus of Chinese Historical Characters
Ting Bai
Jiazheng Kang
Jiayang Fan
AI4CE
47
2
0
28 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Junge Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Fei Wu
G. Wang
Eduard H. Hovy
OffRL
134
7
0
05 Dec 2024
Unifying KV Cache Compression for Large Language Models with LeanKV
Yanqi Zhang
Yuwei Hu
Runyuan Zhao
John C. S. Lui
Haibo Chen
MQ
151
5
0
04 Dec 2024
Yi-Lightning Technical Report
01. AI
:
Alan Wake
Albert Wang
Bei Chen
...
Yuxuan Sha
Zhaodong Yan
Zhiyuan Liu
Zirui Zhang
Zonghong Dai
OSLM
102
3
0
02 Dec 2024
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Jiange Yang
Haoyi Zhu
Yanjie Wang
Gangshan Wu
Tong He
Limin Wang
103
2
0
21 Nov 2024
Efficient Transfer Learning for Video-language Foundation Models
Haoxing Chen
Zizheng Huang
Y. Hong
Yanshuo Wang
Zhongcai Lyu
Zhuoer Xu
Jun Lan
Zhangxuan Gu
VLM
54
0
0
18 Nov 2024
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Hanshi Sun
Li-Wen Chang
Yiyuan Ma
Wenlei Bao
Ningxin Zheng
Xin Liu
Harry Dong
Yuejie Chi
Beidi Chen
VLM
88
16
0
28 Oct 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Liwen Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
49
3
0
24 Oct 2024
LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems
Nan Xu
Xuezhe Ma
LRM
59
3
0
18 Oct 2024
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
Yulei Qian
Fengcun Li
Xiangyang Ji
Xiaoyu Zhao
Jianchao Tan
Kaipeng Zhang
Xunliang Cai
MoE
79
3
0
16 Oct 2024
TestAgent: A Framework for Domain-Adaptive Evaluation of LLMs via Dynamic Benchmark Construction and Exploratory Interaction
Wanying Wang
Zeyu Ma
Pengfei Liu
Mingang Chen
LLMAG
50
1
0
15 Oct 2024
FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback
Heng Chang
Miao Zheng
Fan Yang
Bin Cui
Tengjiao Wang
Xin Wu
Guosheng Dong
Wentao Zhang
ALM
51
6
0
12 Oct 2024
Round and Round We Go! What makes Rotary Positional Encodings useful?
Federico Barbero
Alex Vitvitskyi
Christos Perivolaropoulos
Razvan Pascanu
Petar Velickovic
83
19
0
08 Oct 2024
LongGenBench: Long-context Generation Benchmark
Xiang Liu
Peijie Dong
Xuming Hu
Xiaowen Chu
RALM
55
8
0
05 Oct 2024
ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent Collaboration
Zixiang Wang
Yinghao Zhu
Huiya Zhao
Xiaochen Zheng
Tianlong Wang
...
Yasha Wang
Ewen M. Harrison
Junyi Gao
Liantao Ma
Liantao Ma
55
1
0
03 Oct 2024
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Yao Teng
Han Shi
Xian Liu
Xuefei Ning
Guohao Dai
Yu Wang
Zhenguo Li
Xihui Liu
58
10
0
02 Oct 2024
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
Peiyi Zhang
Yazhou Zhang
Bo Wang
Lu Rong
Jing Qin
Jing Qin
AI4Ed
ELM
49
1
0
19 Sep 2024
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning
Jin Jiang
Yuchen Yan
Yang Liu
Yonggang Jin
Shuai Peng
Hao Fei
Xunliang Cai
Yixin Cao
Liangcai Gao
Zhi Tang
LRM
52
3
0
19 Sep 2024
Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing
Wenyuan Zhang
Shuaiyi Nie
Shuaiyi Nie
Zefeng Zhang
Xinghua Zhang
Yongquan He
Tingwen Liu
32
4
0
18 Sep 2024
Large Language Models are Good Multi-lingual Learners : When LLMs Meet Cross-lingual Prompts
Teng Wang
Zhenqi He
Wing-Yin Yu
Xiaojin Fu
Xiongwei Han
LRM
59
5
0
17 Sep 2024
XG-NID: Dual-Modality Network Intrusion Detection using a Heterogeneous Graph Neural Network and Large Language Model
Yasir Ali Farrukh
S. Wali
I. Khan
Nathaniel D. Bastian
189
2
0
27 Aug 2024
Retrieve, Summarize, Plan: Advancing Multi-hop Question Answering with an Iterative Approach
Zhouyu Jiang
Mengshu Sun
Lei Liang
Qing Cui
RALM
80
11
0
18 Jul 2024
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Jiayi Yuan
Hongyi Liu
Shaochen
Zhong
Yu-Neng Chuang
...
Hongye Jin
V. Chaudhary
Zhaozhuo Xu
Zirui Liu
Xia Hu
46
18
0
01 Jul 2024
Too Late to Train, Too Early To Use? A Study on Necessity and Viability of Low-Resource Bengali LLMs
Tamzeed Mahfuz
Satak Kumar Dey
Ruwad Naswan
Hasnaen Adil
Khondker Salman Sayeed
Haz Sameen Shahgir
39
0
0
29 Jun 2024
Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement
Yunlong Feng
Yang Xu
Dechuan Teng
Honglin Mu
Xiao Xu
Libo Qin
Wanxiang Che
Qingfu Zhu
27
4
0
25 Jun 2024
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Deng Cai
Huayang Li
Tingchen Fu
Siheng Li
Weiwen Xu
...
Leyang Cui
Yan Wang
Lemao Liu
Taro Watanabe
Shuming Shi
KELM
30
2
0
24 Jun 2024
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo
Minh Chien Vu
Jenny Chim
Han Hu
Wenhao Yu
...
David Lo
Daniel Fried
Xiaoning Du
H. D. Vries
Leandro von Werra
77
134
0
22 Jun 2024
UltraMedical: Building Specialized Generalists in Biomedicine
Kaiyan Zhang
Sihang Zeng
Ermo Hua
Ning Ding
Zhang-Ren Chen
...
Xuekai Zhu
Xingtai Lv
Hu Jinfang
Zhiyuan Liu
Bowen Zhou
LM&MA
43
22
0
06 Jun 2024
Mitigate Position Bias in Large Language Models via Scaling a Single Dimension
Yijiong Yu
Huiqiang Jiang
Xufang Luo
Qianhui Wu
Chin-Yew Lin
Dongsheng Li
Yuqing Yang
Yongfeng Huang
L. Qiu
50
9
0
04 Jun 2024
Previous
1
2
3
Next