Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.12474
Cited By
Evaluating the Performance of Large Language Models on GAOKAO Benchmark
21 May 2023
Xiaotian Zhang
Chun-yan Li
Yi Zong
Zhengyu Ying
Liang He
Xipeng Qiu
ALM
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluating the Performance of Large Language Models on GAOKAO Benchmark"
30 / 80 papers shown
Title
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Jun Zhao
Zhihao Zhang
Luhui Gao
Qi Zhang
Tao Gui
Xuanjing Huang
ELM
35
65
0
02 Jan 2024
NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes
Lizhou Fan
Wenyue Hua
Lingyao Li
Haoyang Ling
Yongfeng Zhang
LRM
31
45
0
22 Dec 2023
YAYI 2: Multilingual Open-Source Large Language Models
Yin Luo
Qingchao Kong
Nan Xu
Jia Cao
Bao Hao
...
Zhaoxin Yu
Zhengda Luo
Wenji Mao
Lei Wang
Dajun Zeng
ALM
OSLM
43
7
0
22 Dec 2023
Urban Generative Intelligence (UGI): A Foundational Platform for Agents in Embodied City Environment
Fengli Xu
Jun Zhang
Chen Gao
J. Feng
Yong Li
AI4CE
LLMAG
26
29
0
19 Dec 2023
Evaluating GPT-4's Vision Capabilities on Brazilian University Admission Exams
Ramon Pires
Thales Sales Almeida
Hugo Queiroz Abonizio
Rodrigo Nogueira
ELM
25
3
0
23 Nov 2023
CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model
Yang Lei
Jiangtong Li
Dawei Cheng
Zhijun Ding
Changjun Jiang
18
9
0
10 Nov 2023
TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs
Shuyi Xie
Wenlin Yao
Yong Dai
Shaobo Wang
Donlin Zhou
...
Zhichao Hu
Dong Yu
Zhengyou Zhang
Jing Nie
Yuhong Liu
ELM
ALM
16
4
0
09 Nov 2023
Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset
Haoyi Wu
Wenyang Hui
Yezeng Chen
Weiqi Wu
Kewei Tu
Yi Zhou
LRM
43
3
0
09 Nov 2023
Unveiling A Core Linguistic Region in Large Language Models
Jun Zhao
Zhihao Zhang
Yide Ma
Qi Zhang
Tao Gui
Luhui Gao
Xuanjing Huang
67
5
0
23 Oct 2023
Self-Convinced Prompting: Few-Shot Question Answering with Repeated Introspection
Haodi Zhang
Min Cai
Xinhe Zhang
Chen Zhang
Rui Mao
Kaishun Wu
KELM
LRM
ReLM
30
8
0
08 Oct 2023
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
29
1,577
0
28 Sep 2023
LawBench: Benchmarking Legal Knowledge of Large Language Models
Zhiwei Fei
Xiaoyu Shen
D. Zhu
Fengzhe Zhou
Zhuo Han
Songyang Zhang
Kai-xiang Chen
Zongwen Shen
Jidong Ge
ELM
AILaw
34
34
0
28 Sep 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
40
66
0
21 Sep 2023
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Guosheng Dong
Zhiying Wu
ELM
LRM
66
703
0
19 Sep 2023
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models
Fei Tang
Wanling Gao
Luzhou Peng
Jianfeng Zhan
ELM
14
2
0
05 Sep 2023
ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large Language Models
Baolin Zhang
Hai-Yong Xie
Pengfan Du
Junhao Chen
Pengfei Cao
Yubo Chen
Shengping Liu
Kang Liu
Jun Zhao
ELM
ALM
24
1
0
28 Aug 2023
FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models
Liwen Zhang
Wei Cai
Zhaowei Liu
Zhi Yang
Wei Dai
...
Zhiqiang Liu
Zhoufan Zhu
Anbo Wu
Xinnan Guo
Yun Chen
ELM
ALM
30
24
0
19 Aug 2023
CMB: A Comprehensive Medical Benchmark in Chinese
Xidong Wang
Guiming Hardy Chen
Dingjie Song
Zhiyi Zhang
Zhihong Chen
...
Feng Jiang
Jianquan Li
Xiang Wan
Benyou Wang
Haizhou Li
LM&MA
ELM
AI4MH
33
79
0
17 Aug 2023
Evaluating the Generation Capabilities of Large Chinese Language Models
Hui Zeng
Jingyuan Xue
Meng Hao
Chen Sun
Bin Ning
Na Zhang
ELM
18
12
0
09 Aug 2023
CLEVA: Chinese Language Models EVAluation Platform
Yanyang Li
Jianqiao Zhao
Duo Zheng
Zi-Yuan Hu
Zhi Chen
...
Yongfeng Huang
Shijia Huang
Dahua Lin
Michael R. Lyu
Liwei Wang
ALM
ELM
38
10
0
09 Aug 2023
RoCar: A Relationship Network-based Evaluation Method to Large Language Models
Ming Wang
Wenfang Wu
Chongyun Gao
Daling Wang
Shi Feng
Yifei Zhang
20
0
0
29 Jul 2023
CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility
Guohai Xu
Jiayi Liu
Mingshi Yan
Haotian Xu
Jinghui Si
...
Rong Zhang
Ji Zhang
Chao Peng
Feiyan Huang
Jingren Zhou
ALM
ELM
33
73
0
19 Jul 2023
Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models
Yuxi Ma
Chi Zhang
Song-Chun Zhu
ELM
ALM
37
8
0
07 Jul 2023
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
72
1,513
0
06 Jul 2023
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models
Wenxuan Zhang
Sharifah Mahani Aljunied
Chang Gao
Yew Ken Chia
Lidong Bing
ELM
29
81
0
08 Jun 2023
Model Spider: Learning to Rank Pre-Trained Models Efficiently
Yi-Kai Zhang
Ting Huang
Yao-Xiang Ding
De-Chuan Zhan
Han-Jia Ye
34
23
0
06 Jun 2023
ExplainCPE: A Free-text Explanation Benchmark of Chinese Pharmacist Examination
Dongfang Li
Jindi Yu
Baotian Hu
Zhenran Xu
Mengdi Zhang
ELM
6
11
0
22 May 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
301
2,232
0
22 Mar 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
330
11,953
0
04 Mar 2022
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
Previous
1
2