Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.08322
Cited By
v1
v2
v3 (latest)
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
15 May 2023
Yuzhen Huang
Yuzhuo Bai
Zhihao Zhu
Junlei Zhang
Jinghan Zhang
Tangjun Su
Junteng Liu
Chuancheng Lv
Yikai Zhang
Jiayi Lei
Yao Fu
Maosong Sun
Junxian He
ELM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models"
50 / 105 papers shown
Title
Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law
Qiming Ge
Shuhao Xing
Songyang Gao
Yunhua Zhou
Yicheng Zou
...
Zhi Chen
Hang Yan
Qi Zhang
Q. Guo
Kai Chen
30
0
0
16 Jun 2025
Model Merging for Knowledge Editing
Zichuan Fu
Xian Wu
Guojing Li
Yingying Zhang
Yefeng Zheng
Tianshi Ming
Y. X. R. Wang
Wanyu Wang
Xiangyu Zhao
KELM
MoMe
CLL
23
0
0
14 Jun 2025
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
Jiashuo Yu
Y. Wu
Meng Chu
Zhifei Ren
Z. Huang
...
Conghui He
Yu Qiao
Yali Wang
Yi Wang
L. Wang
LRM
114
0
0
12 Jun 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM Team
Chaojun Xiao
Yuxuan Li
Xu Han
Yuzhuo Bai
...
Zhiyuan Liu
Guoyang Zeng
Chao Jia
Dahai Li
Maosong Sun
MLLM
32
0
0
09 Jun 2025
Infinity Instruct: Scaling Instruction Selection and Synthesis to Enhance Language Models
Jijie Li
Li Du
hanyu Zhao
Bo Zhang
Liangdong Wang
Boyan Gao
Guang Liu
Yonghua Lin
ALM
SyDa
25
0
0
09 Jun 2025
dots.llm1 Technical Report
Bi Huo
Bin Tu
Cheng Qin
Da Zheng
Debing Zhang
...
Yuqiu Ji
Ze Wen
Zhenhai Liu
Zichao Li
Zilong Liao
MoE
47
0
0
06 Jun 2025
EssayBench: Evaluating Large Language Models in Multi-Genre Chinese Essay Writing
Fan Gao
Dongyuan Li
Ding Xia
Fei Mi
Yasheng Wang
Lifeng Shang
Baojun Wang
ELM
37
0
0
03 Jun 2025
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
Shihan Dou
Ming Zhang
Chenhao Huang
Jiayi Chen
F. Chen
...
Wei Chengzhi
Lin Yan
Qi Zhang
Xuanjing Huang
Xuanjing Huang
ELM
79
0
0
03 Jun 2025
IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory
Wei Song
Z. Huang
Cheng Cheng
W. Gao
Bihan Xu
Guanhao Zhao
Fei-Yue Wang
Runze Wu
KELM
36
0
0
01 Jun 2025
LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text
Li yunhan
Wu gengshen
AILaw
ELM
ALM
20
0
0
30 May 2025
ECHO-LLaMA: Efficient Caching for High-Performance LLaMA Training
Maryam Dialameh
Rezaul Karim
Hossein Rajabzadeh
Omar Mohamed Awad
Hyock Ju Kwon
Boxing Chen
Walid Ahmed
Yang Liu
97
0
0
22 May 2025
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
Tencent Hunyuan Team
Ao Liu
Botong Zhou
Can Xu
Chayse Zhou
...
Bingxin Qu
Bolin Ni
Boyu Wu
Chen Li
Cheng-peng Jiang
MoE
LRM
AI4CE
160
0
0
21 May 2025
Enhancing LLMs via High-Knowledge Data Selection
Feiyu Duan
Xuemiao Zhang
Sirui Wang
Haoran Que
Yuqi Liu
Wenge Rong
Xunliang Cai
237
0
0
20 May 2025
DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models
Yakun Zhu
Zhongzhen Huang
Linjie Mu
Yutong Huang
Wei Nie
Jiaji Liu
Shaoting Zhang
Pengfei Liu
Xiaofan Zhang
LM&MA
ELM
LRM
164
0
0
20 May 2025
MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities
Jingxue Chen
Qingkun Tang
Qianchun Lu
Siyuan Fang
94
0
0
17 May 2025
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
Kai Hua
Steven Wu
Ge Zhang
Ke Shen
LRM
83
0
0
12 May 2025
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
Xiaomi LLM-Core Team
Bingquan Xia
Bo Shen
Cici
Dawei Zhu
...
Yun Wang
Yue Yu
Zhenru Lin
Zhichao Song
Zihao Yue
MoE
ReLM
LRM
AI4CE
169
7
0
12 May 2025
TeleEval-OS: Performance evaluations of large language models for operations scheduling
Yanyan Wang
Yingying Wang
Junli Liang
Yin Xu
Yunlong Liu
...
Fei Li
Long Zhao
Kuang Xu
Qi Song
Xiangyang Li
AI4TS
25
0
0
06 May 2025
Quantitative Analysis of Performance Drop in DeepSeek Model Quantization
Enbo Zhao
Yi Shen
Shuming Shi
Jieyun Huang
Z. Chen
Rongjia Du
Siqi Xiao
Jing Zhang
Ning Wang
Shiguo Lian
MQ
151
0
0
05 May 2025
Measuring Hong Kong Massive Multi-Task Language Understanding
Chuxue Cao
Zhenghao Zhu
Junqi Zhu
Guoying Lu
Siyu Peng
Juntao Dai
Weijie Shi
Sirui Han
Yike Guo
ELM
451
0
0
04 May 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLM
VLM
221
132
1
14 Apr 2025
Large Language Models Could Be Rote Learners
Yuyang Xu
Renjun Hu
Haochao Ying
Jian Wu
Xing Shi
Wei Lin
ELM
438
0
0
11 Apr 2025
Efficient Evaluation of Large Language Models via Collaborative Filtering
Xu-Xiang Zhong
Chao Yi
Han-Jia Ye
118
0
0
05 Apr 2025
Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training
Zhijun Wang
Jiahuan Li
Hao Zhou
Rongxiang Weng
Jiadong Wang
Xin Huang
Xue Han
Junlan Feng
Chao Deng
Shujian Huang
LRM
106
3
0
02 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José P. Pombal
Nuno M. Guerreiro
Ricardo Rei
André F. T. Martins
ALM
139
2
0
01 Apr 2025
HKCanto-Eval: A Benchmark for Evaluating Cantonese Language Understanding and Cultural Comprehension in LLMs
Tsz Chung Cheng
Chung Shing Cheng
Chaak Ming Lau
Eugene Tin-Ho Lam
Chun Yat Wong
Hoi On Yu
Cheuk Hei Chong
ELM
105
2
0
16 Mar 2025
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models
Chuan Qin
Xiusi Chen
Chengrui Wang
Pengmin Wu
Xi Chen
...
Han Wu
Chong Li
Yuanchun Zhou
H. Xiong
Hengshu Zhu
ELM
91
2
0
12 Mar 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
Chenyu Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoE
ALM
196
5
0
07 Mar 2025
CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation
Haitao Li
Jiaying Ye
Yiran Hu
Jia Chen
Qingyao Ai
...
Junjie Chen
Yuxiao Chen
Cheng Luo
Quan Zhou
Yixiao Liu
AILaw
ELM
134
2
0
25 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
210
9
0
24 Feb 2025
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models
Zihao Wei
Jingcheng Deng
Liang Pang
Hanxing Ding
Huawei Shen
Xueqi Cheng
KELM
143
7
0
20 Feb 2025
Baichuan-M1: Pushing the Medical Capability of Large Language Models
Binghai Wang
Haizhou Zhao
Huozhi Zhou
Liang Song
Mingyu Xu
...
Yan Zhang
Yifei Duan
Yuyan Zhou
Zhi-Ming Ma
Zhikai Wu
LM&MA
ELM
AI4MH
121
10
0
18 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Jiaqi Zhao
Ming Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
246
2
0
18 Feb 2025
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
Xinyu Zhang
Yuxuan Dong
Yongpeng Wu
Jiaxing Huang
Chengyou Jia
Basura Fernando
Mike Zheng Shou
Lingling Zhang
Jun Liu
AIMat
ReLM
LRM
114
13
0
17 Feb 2025
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models
Yingshui Tan
Yilei Jiang
Yongbin Li
Qingbin Liu
Xingyuan Bu
Wenbo Su
Xiangyu Yue
Xiaoyong Zhu
Bo Zheng
ALM
153
6
0
17 Feb 2025
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
Mingni Tang
Jiajia Li
Lu Yang
Zhiqiang Zhang
Jinghao Tian
Zehan Li
Lefei Zhang
Peijie Wang
90
0
0
17 Feb 2025
Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging
Zhixiang Wang
Zhenyu Mao
Yixuan Qiao
Hao Sun
Biye Li
MoMe
122
0
0
17 Feb 2025
Improve LLM-as-a-Judge Ability as a General Ability
Jiachen Yu
Shaoning Sun
Xiaohui Hu
Jiaxu Yan
Kaidong Yu
Xuelong Li
ELM
155
7
0
17 Feb 2025
Valuable Hallucinations: Realizable Non-realistic Propositions
Qiucheng Chen
Bo Wang
LRM
138
0
0
16 Feb 2025
Quantifying the Capability Boundary of DeepSeek Models: An Application-Driven Performance Analysis
Kaikai Zhao
Zhaoxiang Liu
Xuejiao Lei
Rongjia Du
Zhenhong Long
...
Minjie Hua
Kai Wang
Wen Liu
Ning Wang
Kai Wang
ELM
LRM
107
1
0
16 Feb 2025
Large Language Diffusion Models
Shen Nie
Fengqi Zhu
Zebin You
Xiaolu Zhang
Jingyang Ou
Jun Hu
Jun Zhou
Yankai Lin
Ji-Rong Wen
Chongxuan Li
271
55
0
14 Feb 2025
Improving Natural Language Understanding for LLMs via Large-Scale Instruction Synthesis
Lin Yuan
Jun Xu
Honghao Gui
Mengshu Sun
Qing Cui
Lei Liang
Jun Zhou
AI4CE
453
0
0
06 Feb 2025
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
Xin Xu
Qiyun Xu
Tong Xiao
Tianhao Chen
Yuchen Yan
Jiaxin Zhang
Shizhe Diao
Can Yang
Yang Wang
LRM
AI4CE
ELM
276
8
0
01 Feb 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
Qingbin Liu
Tao Zhang
Tao Zhang
Tian Jin
...
Jianhua Xu
Haoze Sun
Mingan Lin
Guosheng Dong
Xin Wu
AuLLM
181
23
0
28 Jan 2025
WisdomBot: Tuning Large Language Models with Artificial Intelligence Knowledge
Jingyuan Chen
Tao Wu
Wei Ji
Leilei Gan
79
0
0
22 Jan 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zihao Huang
Ziyao Xu
Zhiyong Yang
Zonghan Yang
Zongyu Lin
OffRL
ALM
AI4TS
VLM
LRM
349
338
0
22 Jan 2025
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
Zihan Qiu
Zeyu Huang
Jian Xu
Kaiyue Wen
Zhaoxiang Wang
Rui Men
Ivan Titov
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
137
7
0
21 Jan 2025
CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity
Zhengmin Yu
Jiutian Zeng
Siyi Chen
Wenhan Xu
Dandan Xu
Xiangyu Liu
Zonghao Ying
Nan Wang
Yuan Zhang
Min Yang
ELM
248
2
0
20 Jan 2025
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
Yinfang Chen
Manish Shetty
Gagan Somashekar
Minghua Ma
Yogesh L. Simmhan
Jonathan Mace
Chetan Bansal
Rujia Wang
Saravan Rajmohan
127
4
0
12 Jan 2025
MULTI: Multimodal Understanding Leaderboard with Text and Images
Zichen Zhu
Yang Xu
Lu Chen
Jingkai Yang
Yichuan Ma
...
Yingzi Ma
Situo Zhang
Zihan Zhao
Liangtai Sun
Kai Yu
VLM
116
5
0
08 Jan 2025
1
2
3
Next