ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.12474
  4. Cited By
Evaluating the Performance of Large Language Models on GAOKAO Benchmark

Evaluating the Performance of Large Language Models on GAOKAO Benchmark

21 May 2023
Xiaotian Zhang
Chun-yan Li
Yi Zong
Zhengyu Ying
Liang He
Xipeng Qiu
    ALM
    ELM
ArXivPDFHTML

Papers citing "Evaluating the Performance of Large Language Models on GAOKAO Benchmark"

50 / 80 papers shown
Title
SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models
SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models
Peichao Lai
Kaipeng Zhang
Yi Lin
L. Zhang
Feiyang Ye
...
Yanwei Xu
Conghui He
Yixuan Wang
Wentao Zhang
Bin Cui
ELM
LRM
44
0
0
12 May 2025
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation
Mengze Hong
Wailing Ng
Di Jiang
Chen Zhang
ELM
55
0
0
08 May 2025
Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning
Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning
J. T. Wang
Jin Jiang
Yang Liu
M. Zhang
Xunliang Cai
LRM
34
0
0
18 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Z. Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLM
VLM
70
12
1
14 Apr 2025
Can the capability of Large Language Models be described by human ability? A Meta Study
Can the capability of Large Language Models be described by human ability? A Meta Study
Mingrui Zan
Yunquan Zhang
Boyang Zhang
Fangming Liu
Daning Cheng
ELM
LM&MA
55
0
0
13 Apr 2025
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
...
Nan Xu
Wei He
Xiangyuan Lan
D. Jiang
Dongbin Zhao
LRM
58
3
0
17 Mar 2025
Code-Driven Inductive Synthesis: Enhancing Reasoning Abilities of Large Language Models with Sequences
Code-Driven Inductive Synthesis: Enhancing Reasoning Abilities of Large Language Models with Sequences
Kedi Chen
Zhikai Lei
Fan Zhang
Yinqi Zhang
Qin Chen
Jie Zhou
Liang He
Qipeng Guo
K. Chen
Wei-na Zhang
ELM
LRM
65
0
0
17 Mar 2025
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
Bo Liu
Yunxiang Li
Yangqiu Song
Hanjing Wang
Linyi Yang
Mark W. Schmidt
Jun Wang
Weinan Zhang
Shuyue Hu
Ying Wen
LLMAG
KELM
LRM
AI4CE
89
6
0
12 Mar 2025
Extrapolation Merging: Keep Improving With Extrapolation and Merging
Yiguan Lin
Bin Xu
Yinghao Li
Yang Gao
MoMe
59
1
0
05 Mar 2025
MMSciBench: Benchmarking Language Models on Multimodal Scientific Problems
Xinwu Ye
Chengfan Li
Siming Chen
Xiangru Tang
Wei Wei
LRM
39
1
0
27 Feb 2025
Baichuan-M1: Pushing the Medical Capability of Large Language Models
Binghui Wang
Haizhou Zhao
Huozhi Zhou
Liang Song
Mingyu Xu
...
Yan Zhang
Yifei Duan
Yuyan Zhou
Zhi-Ming Ma
Zhikai Wu
LM&MA
ELM
AI4MH
42
4
0
18 Feb 2025
Improving Natural Language Understanding for LLMs via Large-Scale Instruction Synthesis
Improving Natural Language Understanding for LLMs via Large-Scale Instruction Synthesis
Lin Yuan
Jun Xu
Honghao Gui
Mengshu Sun
Qing Cui
Lei Liang
Jun Zhou
AI4CE
206
0
0
06 Feb 2025
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
Xin Xu
Qiyun Xu
Tong Xiao
Tianhao Chen
Yuchen Yan
Jiaxin Zhang
Shizhe Diao
Can Yang
Yang Wang
ELM
LRM
AI4CE
104
2
0
01 Feb 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
Jiaheng Liu
Tao Zhang
Tao Zhang
S. Chen
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Xin Wu
AuLLM
72
10
0
28 Jan 2025
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Haotian Luo
Li Shen
Haiying He
Yishuo Wang
Shiwei Liu
Wei Li
Naiqiang Tan
Xiaochun Cao
Dacheng Tao
VLM
LRM
92
41
0
22 Jan 2025
Recursive Decomposition of Logical Thoughts: Framework for Superior Reasoning and Knowledge Propagation in Large Language Models
Kaleem Ullah Qasim
Jiashu Zhang
Tariq Alsahfi
Ateeq Ur Rehman Butt
LRM
ReLM
67
1
0
03 Jan 2025
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
62
47
1
15 Nov 2024
UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts
UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts
Bo Yang
Qingping Yang
Runtao Liu
Runtao Liu
LRM
ReLM
ELM
AIMat
64
1
0
11 Nov 2024
Number Cookbook: Number Understanding of Language Models and How to Improve It
Number Cookbook: Number Understanding of Language Models and How to Improve It
Haotong Yang
Yi Hu
Shijia Kang
Zhouchen Lin
Muhan Zhang
LRM
46
2
0
06 Nov 2024
MoD: A Distribution-Based Approach for Merging Large Language Models
MoD: A Distribution-Based Approach for Merging Large Language Models
Quy-Anh Dang
Chris Ngo
MoMe
VLM
31
0
0
01 Nov 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
L. Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
49
3
0
24 Oct 2024
R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric
  Reasoning in Large Multimodal Models
R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
Linger Deng
Yuliang Liu
Bohan Li
Dongliang Luo
Liang Wu
...
Ziyang Zhang
Gang Zhang
Errui Ding
Yingying Zhu
Xiang Bai
ReLM
3DV
LRM
26
10
0
23 Oct 2024
An Automatic and Cost-Efficient Peer-Review Framework for Language
  Generation Evaluation
An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation
Junjie Chen
Weihang Su
Zhumin Chu
Haitao Li
Qinyao Ai
Yiqun Liu
Min Zhang
Shaoping Ma
29
3
0
16 Oct 2024
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical
  Reasoning
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning
Joshua Ong Jun Leang
Aryo Pradipta Gema
Shay B. Cohen
ReLM
LRM
ReCod
40
2
0
14 Oct 2024
SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
L. Yang
Zhaochen Yu
T. Zhang
Minkai Xu
Joseph E. Gonzalez
Bin Cui
Shuicheng Yan
ELM
ReLM
LRM
51
0
0
11 Oct 2024
A Learning Rate Path Switching Training Paradigm for Version Updates of
  Large Language Models
A Learning Rate Path Switching Training Paradigm for Version Updates of Large Language Models
Zhihao Wang
Shiyu Liu
Jianheng Huang
Zheng Wang
Yixuan Liao
Xiaoxin Chen
Junfeng Yao
Jinsong Su
29
1
0
05 Oct 2024
CJEval: A Benchmark for Assessing Large Language Models Using Chinese
  Junior High School Exam Data
CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data
Qian-Wen Zhang
Haochen Wang
Fang Li
Siyu An
Lingfeng Qiao
Liangcai Gao
Di Yin
Xing Sun
ELM
AI4Ed
27
0
0
24 Sep 2024
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
Peiyi Zhang
Yazhou Zhang
Bo Wang
Lu Rong
Jing Qin
Jing Qin
AI4Ed
ELM
49
1
0
19 Sep 2024
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via
  Self-Improvement
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
An Yang
Beichen Zhang
Binyuan Hui
Bofei Gao
Bowen Yu
...
Mingfeng Xue
Runji Lin
Tianyu Liu
Xingzhang Ren
Zhenru Zhang
OSLM
LRM
36
179
0
18 Sep 2024
See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering
  LLM Weaknesses
See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses
Yulong Chen
Yang Liu
Jianhao Yan
X. Bai
Ming Zhong
Yinghao Yang
Ziyi Yang
Chenguang Zhu
Yue Zhang
ALM
ELM
35
6
0
16 Aug 2024
CFinBench: A Comprehensive Chinese Financial Benchmark for Large
  Language Models
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models
Ying Nie
Binwei Yan
Tianyu Guo
Hao Liu
Haoyu Wang
...
Weihao Wang
Qiang Li
Weijian Sun
Yunhe Wang
Dacheng Tao
ELM
47
2
0
02 Jul 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang
Zengzhi Wang
Shijie Xia
Xuefeng Li
Haoyang Zou
...
Yuxiang Zheng
Shaoting Zhang
Dahua Lin
Yu Qiao
Pengfei Liu
ELM
LRM
49
26
0
18 Jun 2024
Dynamic data sampler for cross-language transfer learning in large
  language models
Dynamic data sampler for cross-language transfer learning in large language models
Yudong Li
Yuhao Feng
Wen Zhou
Zhe Zhao
Linlin Shen
Cheng-An Hou
Xianxu Hou
46
4
0
17 May 2024
Can large language models understand uncommon meanings of common words?
Can large language models understand uncommon meanings of common words?
Jinyang Wu
Feihu Che
Xinxin Zheng
Shuai Zhang
Ruihan Jin
Shuai Nie
Pengpeng Shao
Jianhua Tao
34
1
0
09 May 2024
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of
  Large Language Models
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models
Wei Li
Ren Ma
Jiang Wu
Chenya Gu
Jiahui Peng
Jinyang Len
Songyang Zhang
Hang Yan
Dahua Lin
Conghui He
ELM
21
0
0
29 Apr 2024
Multilingual Large Language Model: A Survey of Resources, Taxonomy and
  Frontiers
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers
Libo Qin
Qiguang Chen
Yuhang Zhou
Zhi Chen
Hai-Tao Zheng
Lizi Liao
Min Li
Wanxiang Che
Philip S. Yu
LRM
55
36
0
07 Apr 2024
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models
  with a Self-Critique Pipeline
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Yifan Xu
Xiao Liu
Xinghan Liu
Zhenyu Hou
Yueyan Li
...
Aohan Zeng
Zhengxiao Du
Wenyi Zhao
Jie Tang
Yuxiao Dong
LRM
36
35
0
03 Apr 2024
Large Language Models for Education: A Survey and Outlook
Large Language Models for Education: A Survey and Outlook
Shen Wang
Tianlong Xu
Hang Li
Chaoli Zhang
Joleen Liang
Jiliang Tang
Philip S. Yu
Qingsong Wen
AI4Ed
35
93
0
26 Mar 2024
Benchmarking Chinese Commonsense Reasoning of LLMs: From
  Chinese-Specifics to Reasoning-Memorization Correlations
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations
Jiaxing Sun
Weiquan Huang
Jiang Wu
Chenya Gu
Wei Li
Songyang Zhang
Hang Yan
Conghui He
LRM
44
5
0
21 Mar 2024
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation
  Benchmark for Chinese Large Language Models
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models
Chuang Liu
Renren Jin
Yuqi Ren
Deyi Xiong
ELM
35
0
0
19 Mar 2024
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and
  Safety
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety
Chuang Liu
Linhao Yu
Jiaxuan Li
Renren Jin
Yufei Huang
...
Tao Liu
Jinwang Song
Hongying Zan
Sun Li
Deyi Xiong
ELM
32
7
0
18 Mar 2024
Yi: Open Foundation Models by 01.AI
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
121
500
0
07 Mar 2024
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Zhengyang Tang
Xingxing Zhang
Benyou Wang
Furu Wei
ALM
LRM
32
59
0
05 Mar 2024
GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models
  Evaluation
GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation
Yi Zong
Xipeng Qiu
ELM
VLM
29
6
0
24 Feb 2024
GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on
  Geometry Problem-Solving
GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving
Jiaxin Zhang
Zhong-Zhi Li
Ming-Liang Zhang
Fei Yin
Cheng-Lin Liu
Yashar Moshfeghi
ELM
LRM
46
17
0
15 Feb 2024
SceMQA: A Scientific College Entrance Level Multimodal Question
  Answering Benchmark
SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark
Zhenwen Liang
Kehan Guo
Gang Liu
Taicheng Guo
Yujun Zhou
Tianyu Yang
Jiajun Jiao
Renjie Pi
Jipeng Zhang
Xiangliang Zhang
ELM
33
18
0
06 Feb 2024
PRE: A Peer Review Based Large Language Model Evaluator
PRE: A Peer Review Based Large Language Model Evaluator
Zhumin Chu
Qingyao Ai
Yiteng Tu
Haitao Li
Yiqun Liu
LRM
ALM
41
21
0
28 Jan 2024
An Empirical Study on Large Language Models in Accuracy and Robustness
  under Chinese Industrial Scenarios
An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial Scenarios
Zongjie Li
Wenying Qiu
Pingchuan Ma
Yichen Li
You Li
Sijia He
Baozheng Jiang
Shuai Wang
Weixi Gu
27
2
0
27 Jan 2024
Orion-14B: Open-source Multilingual Large Language Models
Orion-14B: Open-source Multilingual Large Language Models
Du Chen
Yi Huang
Xiaopu Li
Yongqiang Li
Yongqiang Liu
Haihui Pan
Leichao Xu
Dacheng Zhang
Zhipeng Zhang
Kun Han
35
4
0
20 Jan 2024
TeleChat Technical Report
TeleChat Technical Report
Zhongjiang He
Zihan Wang
Xinzhan Liu
Shixuan Liu
Yitong Yao
...
Zilu Huang
Sishi Xiong
Yuxiang Zhang
Chao Wang
Shuangyong Song
AI4MH
LRM
ALM
58
3
0
08 Jan 2024
12
Next