ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.08322
  4. Cited By
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for
  Foundation Models
v1v2v3 (latest)

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

15 May 2023
Yuzhen Huang
Yuzhuo Bai
Zhihao Zhu
Junlei Zhang
Jinghan Zhang
Tangjun Su
Junteng Liu
Chuancheng Lv
Yikai Zhang
Jiayi Lei
Yao Fu
Maosong Sun
Junxian He
    ELMLRM
ArXiv (abs)PDFHTML

Papers citing "C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models"

50 / 105 papers shown
Title
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity
Pengfei Jing
Mengyun Tang
Xiaorong Shi
Xing Zheng
Sen Nie
Shi Wu
Yong Yang
Xiapu Luo
ELM
104
2
0
30 Dec 2024
MERaLiON-TextLLM: Cross-Lingual Understanding of Large Language Models in Chinese, Indonesian, Malay, and Singlish
MERaLiON-TextLLM: Cross-Lingual Understanding of Large Language Models in Chinese, Indonesian, Malay, and Singlish
Xin Huang
Tarun K. Vangani
Minh Duc Pham
Xunlong Zou
Bin Wang
Zhengyuan Liu
Ai Ti Aw
LRM
128
2
0
21 Dec 2024
Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference
Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference
Siyuan Wang
Dianyi Wang
Chengxing Zhou
Zejun Li
Zhihao Fan
Xuanjing Huang
Zhongyu Wei
VLM
521
0
0
17 Dec 2024
LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language Models
LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language Models
Runming Yang
Taiqiang Wu
Jiahao Wang
Pengfei Hu
Ngai Wong
Yujiu Yang
Yujiu Yang
446
1
0
11 Nov 2024
CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs
CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs
Zhihao Liu
Chenhui Hu
ALMELM
75
1
0
29 Oct 2024
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Yiming Wang
Pei Zhang
Baosong Yang
Derek F. Wong
Rui Wang
LRM
115
15
0
17 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
103
18
0
15 Oct 2024
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Zhihao He
Hang Yu
Zi Gong
Shizhan Liu
Jia-Nan Li
Weiyao Lin
VLM
104
2
0
09 Oct 2024
Visual Perception in Text Strings
Visual Perception in Text Strings
Qi Jia
Xiang Yue
Shanshan Huang
Ziheng Qin
Yizhu Liu
Bill Yuchen Lin
Yang You
VLM
80
2
0
02 Oct 2024
Training on the Benchmark Is Not All You Need
Training on the Benchmark Is Not All You Need
Shiwen Ni
Xiangtao Kong
Chengming Li
Xiping Hu
Ruifeng Xu
Jia Zhu
Min Yang
150
6
0
03 Sep 2024
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
Bin Wang
Chunyu Xie
Dawei Leng
Yuhui Yin
MLLM
177
1
0
23 Aug 2024
Can Large Language Models Understand Symbolic Graphics Programs?
Can Large Language Models Understand Symbolic Graphics Programs?
Zeju Qiu
Weiyang Liu
Haiwen Feng
Zhen Liu
Tim Z. Xiao
Katherine M. Collins
J. Tenenbaum
Adrian Weller
Michael J. Black
Bernhard Schölkopf
125
14
0
15 Aug 2024
VITA: Towards Open-Source Interactive Omni Multimodal LLM
VITA: Towards Open-Source Interactive Omni Multimodal LLM
Chaoyou Fu
Haojia Lin
Zuwei Long
Yunhang Shen
Meng Zhao
...
Rongrong Ji
Xing Sun
Ran He
Caifeng Shan
Xing Sun
MLLM
140
96
0
09 Aug 2024
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
Leo Micklem
Yan-Bin Shen
Wenjing Luo
Yan Zhang
Hao Liang
...
Weipeng Chen
Bin Cui
Blair Thornton
Wentao Zhang
Guosheng Dong
ELM
140
21
0
02 Aug 2024
PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation
PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation
Jinpeng Hu
Tengteng Dong
Luo Gang
Hui Ma
Peng Zou
Xiao Sun
Dan Guo
Meng Wang
AI4MH
92
7
0
08 Jul 2024
AudioBench: A Universal Benchmark for Audio Large Language Models
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang
Xunlong Zou
Geyu Lin
Siyang Song
Zhuohan Liu
Wenyu Zhang
Zhengyuan Liu
AiTi Aw
Nancy F. Chen
AuLLMELMLM&MA
169
35
0
23 Jun 2024
The Music Maestro or The Musically Challenged, A Massive Music
  Evaluation Benchmark for Large Language Models
The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models
Jiajia Li
Lu Yang
Mingni Tang
Cong Chen
Zuchao Li
Ping Wang
Hai Zhao
LM&MA
82
6
0
22 Jun 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang
Zengzhi Wang
Shijie Xia
Xuefeng Li
Haoyang Zou
...
Yuxiang Zheng
Shaoting Zhang
Dahua Lin
Yu Qiao
Pengfei Liu
ELMLRM
138
43
0
18 Jun 2024
What is the best model? Application-driven Evaluation for Large Language
  Models
What is the best model? Application-driven Evaluation for Large Language Models
Shiguo Lian
Kaikai Zhao
Xinhui Liu
Xuejiao Lei
Bikun Yang
Wenjing Zhang
Kai Wang
Zhaoxiang Liu
ALMELM
100
3
0
14 Jun 2024
M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models
M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models
Hongyu Wang
Jiayu Xu
Senwei Xie
Ruiping Wang
Jialin Li
Zhaojie Xie
Bin Zhang
Chuyan Xiong
Xilin Chen
ELMVLMLRM
157
6
0
24 May 2024
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of
  Large Language Models
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models
Wei Li
Ren Ma
Jiang Wu
Chenya Gu
Jiahui Peng
Jinyang Len
Songyang Zhang
Hang Yan
Dahua Lin
Conghui He
ELM
45
0
0
29 Apr 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLMLRM
258
197
0
29 Apr 2024
From Matching to Generation: A Survey on Generative Information Retrieval
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhicheng Dou
3DV
210
61
0
23 Apr 2024
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Kehua Feng
Keyan Ding
Hongzhi Tan
Kede Ma
Zhihua Wang
...
Yuzhou Cheng
Ge Sun
Guozhou Zheng
Qiang Zhang
H. Chen
126
13
0
10 Apr 2024
Checkpoint Merging via Bayesian Optimization in LLM Pretraining
Checkpoint Merging via Bayesian Optimization in LLM Pretraining
Deyuan Liu
Zecheng Wang
Bingning Wang
Weipeng Chen
Chunshan Li
Zhiying Tu
Dianhui Chu
Bo Li
Dianbo Sui
MoMe
97
18
0
28 Mar 2024
COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning
COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning
Yuelin Bai
Xinrun Du
Yiming Liang
Yonggang Jin
Ziqiang Liu
...
Chenghua Lin
Jie Fu
Min Yang
Shiwen Ni
Ge Zhang
ALM
79
37
0
26 Mar 2024
Can multiple-choice questions really be useful in detecting the
  abilities of LLMs?
Can multiple-choice questions really be useful in detecting the abilities of LLMs?
Wangyue Li
Liangzhi Li
Tong Xiang
Xiao Liu
Wei Deng
Noa Garcia
ELM
114
35
0
26 Mar 2024
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Jiawen Shi
Zenghui Yuan
Yinuo Liu
Yue Huang
Pan Zhou
Lichao Sun
Neil Zhenqiang Gong
AAML
146
57
0
26 Mar 2024
Understanding Emergent Abilities of Language Models from the Loss Perspective
Understanding Emergent Abilities of Language Models from the Loss Perspective
Zhengxiao Du
Aohan Zeng
Yuxiao Dong
Jie Tang
UQCVLRM
162
56
0
23 Mar 2024
Hyacinth6B: A large language model for Traditional Chinese
Hyacinth6B: A large language model for Traditional Chinese
Chih-Wei Song
Yin-Te Tsai
100
0
0
20 Mar 2024
ChatUIE: Exploring Chat-based Unified Information Extraction using Large
  Language Models
ChatUIE: Exploring Chat-based Unified Information Extraction using Large Language Models
Jun Xu
Mengshu Sun
Qing Cui
Jun Zhou
75
1
0
08 Mar 2024
Yi: Open Foundation Models by 01.AI
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLMLRM
315
576
0
07 Mar 2024
SciAssess: Benchmarking LLM Proficiency in Scientific Literature
  Analysis
SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis
Hengxing Cai
Xiaochen Cai
Junhan Chang
Changhao Nai
Lin Yao
...
Changhong Chen
Zheng Cheng
Zifeng Zhao
Linfeng Zhang
Guolin Ke
ELM
83
25
0
04 Mar 2024
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large
  Language Models
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models
Chenyang Lyu
Minghao Wu
Alham Fikri Aji
ELM
63
14
0
21 Feb 2024
Retrieve Only When It Needs: Adaptive Retrieval Augmentation for
  Hallucination Mitigation in Large Language Models
Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models
Hanxing Ding
Liang Pang
Zihao Wei
Huawei Shen
Xueqi Cheng
HILMRALM
144
18
0
16 Feb 2024
LLMEval: A Preliminary Study on How to Evaluate Large Language Models
LLMEval: A Preliminary Study on How to Evaluate Large Language Models
Yue Zhang
Ming Zhang
Haipeng Yuan
Shichun Liu
Yongyao Shi
Tao Gui
Qi Zhang
Xuanjing Huang
ALMELM
67
15
0
12 Dec 2023
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs
Junying Chen
Xidong Wang
Anningzhe Gao
Feng Jiang
Shunian Chen
...
Chuyi Kong
Jianquan Li
Xiang Wan
Haizhou Li
Benyou Wang
LM&MA
76
68
0
16 Nov 2023
CodeScope: An Execution-based Multilingual Multitask Multidimensional
  Benchmark for Evaluating LLMs on Code Understanding and Generation
CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation
Weixiang Yan
Haitian Liu
Yunkun Wang
Yunzhe Li
Qian Chen
...
Tingyu Lin
Weishan Zhao
Li Zhu
Hari Sundaram
Shuiguang Deng
ELMLRM
140
37
0
14 Nov 2023
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training
  Regime and Better Alignment to Human Preferences
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences
Yuanhe Tian
Ruyi Gan
Yan Song
Jiaxing Zhang
Yongdong Zhang
AI4MHAI4CELM&MA
129
41
0
10 Nov 2023
Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large
  Language Model
Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model
Qichen Ye
Junling Liu
Dading Chong
Peilin Zhou
Yining Hua
...
Meng Cao
Ziming Wang
Xuxin Cheng
Andrew Liu
Zhenhua Guo
AI4MHLM&MAELM
89
22
0
13 Oct 2023
OpsEval: A Comprehensive IT Operations Benchmark Suite for Large Language Models
OpsEval: A Comprehensive IT Operations Benchmark Suite for Large Language Models
Yuhe Liu
Changhua Pei
Longlong Xu
Bohan Chen
Mingze Sun
...
Gaogang Xie
Xidao Wen
Xiaohui Nie
Minghua Ma
Dan Pei
ELM
56
2
0
11 Oct 2023
ReForm-Eval: Evaluating Large Vision Language Models via Unified
  Re-Formulation of Task-Oriented Benchmarks
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Zejun Li
Ye Wang
Mengfei Du
Qingwen Liu
Binhao Wu
...
Zhihao Fan
Jie Fu
Jingjing Chen
Xuanjing Huang
Zhongyu Wei
118
15
0
04 Oct 2023
Qwen Technical Report
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
360
1,922
0
28 Sep 2023
LawBench: Benchmarking Legal Knowledge of Large Language Models
LawBench: Benchmarking Legal Knowledge of Large Language Models
Zhiwei Fei
Xiaoyu Shen
D. Zhu
Fengzhe Zhou
Zhuo Han
Songyang Zhang
Kai-xiang Chen
Zongwen Shen
Jidong Ge
ELMAILaw
134
46
0
28 Sep 2023
Baichuan 2: Open Large-scale Language Models
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Guosheng Dong
Zhiying Wu
ELMLRM
330
755
0
19 Sep 2023
Cross-Lingual Knowledge Editing in Large Language Models
Cross-Lingual Knowledge Editing in Large Language Models
Jiaan Wang
Yunlong Liang
Zengkui Sun
Yu Cao
Jiarong Xu
Fandong Meng
KELM
83
12
0
16 Sep 2023
FLM-101B: An Open LLM and How to Train It with $100K Budget
FLM-101B: An Open LLM and How to Train It with 100KBudget100K Budget100KBudget
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Xuying Meng
...
Li Du
Bowen Qin
Zheng Zhang
Aixin Sun
Yequan Wang
147
22
0
07 Sep 2023
AGIBench: A Multi-granularity, Multimodal, Human-referenced,
  Auto-scoring Benchmark for Large Language Models
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models
Fei Tang
Wanling Gao
Luzhou Peng
Jianfeng Zhan
ELM
47
2
0
05 Sep 2023
CLEVA: Chinese Language Models EVAluation Platform
CLEVA: Chinese Language Models EVAluation Platform
Yanyang Li
Jianqiao Zhao
Duo Zheng
Zi-Yuan Hu
Zhi Chen
...
Yongfeng Huang
Shijia Huang
Dahua Lin
Michael R. Lyu
Liwei Wang
ALMELM
100
11
0
09 Aug 2023
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
Jiaju Lin
Haoran Zhao
Aochi Zhang
Yiting Wu
Huqiuyue Ping
Qin Chen
ELMLLMAG
104
68
0
08 Aug 2023
Previous
123
Next