ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.03030
  4. Cited By
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese
  Medical Exam Dataset

Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset

5 June 2023
Junling Liu
Peilin Zhou
Yining Hua
Dading Chong
Zhongyu Tian
Andrew Liu
Helin Wang
Chenyu You
Zhenhua Guo
Lei Zhu
Michael Lingzhi Li
    LM&MA
    ELM
ArXivPDFHTML

Papers citing "Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset"

50 / 55 papers shown
Title
DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models
DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models
Yakun Zhu
Zhongzhen Huang
Linjie Mu
Yutong Huang
Wei Nie
Shaoting Zhang
Pengfei Liu
Xiaofan Zhang
LM&MA
ELM
LRM
12
0
0
20 May 2025
AI-Driven Automation Can Become the Foundation of Next-Era Science of Science Research
AI-Driven Automation Can Become the Foundation of Next-Era Science of Science Research
Renqi Chen
Haoyang Su
Shixiang Tang
Zhenfei Yin
Qi Wu
Hui Li
Ye Sun
Nanqing Dong
Wanli Ouyang
Philip Torr
AI4CE
12
0
0
17 May 2025
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation
Mengze Hong
Wailing Ng
Di Jiang
Chen Zhang
ELM
55
0
0
08 May 2025
BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese
BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese
Peilin Zhou
Bruce Leon
Xiang Ying
C. Zhang
Yifan Shao
...
Sixin Hong
J. Ren
Jian Chen
Chao-Hong Liu
Yining Hua
RALM
ELM
LRM
50
0
0
27 Apr 2025
LLM Sensitivity Evaluation Framework for Clinical Diagnosis
LLM Sensitivity Evaluation Framework for Clinical Diagnosis
Chenwei Yan
Xiangling Fu
Yuxuan Xiong
Tianyi Wang
Siu Cheung Hui
Ji Wu
Xien Liu
LM&MA
ELM
40
0
0
18 Apr 2025
Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark
Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark
Zehan Li
Yiying Yang
Jiping Lang
Wenhao Jiang
Yuhang Zhao
...
Yuhua Bi
Xiaofei Zeng
Yixian Chen
Junrong Chen
Lin Yao
AI4MH
LM&MA
ELM
46
0
0
22 Mar 2025
MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics
Haoan Jin
Jiacheng Shi
Hanhui Xu
Kenny Q. Zhu
Mengyue Wu
AILaw
ELM
LM&MA
108
0
0
04 Mar 2025
Baichuan-M1: Pushing the Medical Capability of Large Language Models
Binghui Wang
Haizhou Zhao
Huozhi Zhou
Liang Song
Mingyu Xu
...
Yan Zhang
Yifei Duan
Yuyan Zhou
Zhi-Ming Ma
Zhikai Wu
LM&MA
ELM
AI4MH
42
4
0
18 Feb 2025
OphthBench: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Ophthalmology
OphthBench: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Ophthalmology
Chengfeng Zhou
Ji Wang
Juanjuan Qin
Yining Wang
Ling Sun
Weiwei Dai
LM&MA
ELM
91
0
0
03 Feb 2025
Malware Classification using a Hybrid Hidden Markov Model-Convolutional
  Neural Network
Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network
Ritik Mehta
Olha Jurecková
Mark Stamp
61
0
0
25 Dec 2024
Polish Medical Exams: A new dataset for cross-lingual medical knowledge
  transfer assessment
Polish Medical Exams: A new dataset for cross-lingual medical knowledge transfer assessment
Łukasz Grzybowski
Jakub Pokrywka
Michał Ciesiółka
Jeremi Kaczmarek
Marek Kubis
LM&MA
ELM
69
4
0
30 Nov 2024
AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset
AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset
Tobi Olatunji
Charles Nimo
A. Owodunni
Tassallah Abdullahi
Emmanuel Ayodele
...
Michael Best
Irfan Essa
Stephen E. Moore
Chris Fourie
M. Asiedu
LM&MA
86
3
0
23 Nov 2024
Large Language Model Benchmarks in Medical Tasks
Large Language Model Benchmarks in Medical Tasks
Lawrence K. Q. Yan
Ming Li
Yujie Zhang
Caitlyn Heqi Yin
Cheng Fei
...
Ziqian Bi
Pohsun Feng
Keyu Chen
Junyu Liu
Qian Niu
LM&MA
AI4MH
53
6
0
28 Oct 2024
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
Guorui Zheng
Xidong Wang
Juhao Liang
Nuo Chen
Yuping Zheng
Benyou Wang
MoE
35
5
0
14 Oct 2024
CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical
  Large Language Models in Clinical Scenarios
CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios
Zetian Ouyang
Yishuai Qiu
Linlin Wang
Gerard de Melo
Ya Zhang
Yanfeng Wang
Liang He
LM&MA
AI4MH
ELM
25
1
0
04 Oct 2024
A Survey for Large Language Models in Biomedicine
A Survey for Large Language Models in Biomedicine
Chong Wang
Mengyao Li
Junjun He
Zhongruo Wang
Erfan Darzi
...
Yi Yu
Pietro Liò
Tianyun Wang
Yu Guang Wang
Yiqing Shen
LM&MA
39
9
0
29 Aug 2024
MSDiagnosis: An EMR-based Dataset for Clinical Multi-Step Diagnosis
MSDiagnosis: An EMR-based Dataset for Clinical Multi-Step Diagnosis
Ruihui Hou
Shencheng Chen
Yongqi Fan
Lifeng Zhu
Jing Sun
Jingping Liu
Tong Ruan
35
1
0
19 Aug 2024
No Size Fits All: The Perils and Pitfalls of Leveraging LLMs Vary with
  Company Size
No Size Fits All: The Perils and Pitfalls of Leveraging LLMs Vary with Company Size
Ashok Urlana
Charaka Vinayak Kumar
B. Garlapati
Ajeet Kumar Singh
Rahul Mishra
43
1
0
21 Jul 2024
DALL-M: Context-Aware Clinical Data Augmentation with LLMs
DALL-M: Context-Aware Clinical Data Augmentation with LLMs
Chihcheng Hsieh
Catarina Moreira
Isabel Blanco Nobre
Sandra Costa Sousa
Chun Ouyang
M. Brereton
Joaquim A. Jorge
Jacinto C. Nascimento
54
0
0
11 Jul 2024
Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?
Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?
Leonidas Zotos
H. Rijn
Malvina Nissim
ELM
52
2
0
07 Jul 2024
GraphArena: Evaluating and Exploring Large Language Models on Graph Computation
GraphArena: Evaluating and Exploring Large Language Models on Graph Computation
Jianheng Tang
Qifan Zhang
Yuhan Li
Nuo Chen
Jia Li
21
2
0
29 Jun 2024
LLMs for Doctors: Leveraging Medical LLMs to Assist Doctors, Not Replace
  Them
LLMs for Doctors: Leveraging Medical LLMs to Assist Doctors, Not Replace Them
Wenya Xie
Qingying Xiao
Yu Zheng
Xidong Wang
Junying Chen
Ke Ji
Anningzhe Gao
Xiang Wan
Feng Jiang
Benyou Wang
LM&MA
38
3
0
26 Jun 2024
MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment
  and Knowledge Aggregation
MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation
Yusheng Liao
Shuyang Jiang
Yanfeng Wang
Yu Wang
52
2
0
25 Jun 2024
MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to
  200K Tokens
MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens
Yongqi Fan
Hongli Sun
Kui Xue
Xiaofan Zhang
Shaoting Zhang
Tong Ruan
47
0
0
21 Jun 2024
FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for
  LLM-based Agents
FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents
Ruixuan Xiao
Wentao Ma
Ke Wang
Yuchuan Wu
Junbo Zhao
Haobo Wang
Fei Huang
Yongbin Li
47
9
0
21 Jun 2024
Data-Centric AI in the Age of Large Language Models
Data-Centric AI in the Age of Large Language Models
Xinyi Xu
Zhaoxuan Wu
Rui Qiao
Arun Verma
Yao Shu
...
Xiaoqiang Lin
Wenyang Hu
Zhongxiang Dai
Pang Wei Koh
Bryan Kian Hsiang Low
ALM
48
2
0
20 Jun 2024
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics
  in the Real World
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World
Weixiang Yan
Haitian Liu
Tengxiao Wu
Qian Chen
Wen Wang
...
Jiayi Wang
Weishan Zhao
Yixin Zhang
Renjun Zhang
Li Zhu
LM&MA
52
10
0
19 Jun 2024
Language Models are Surprisingly Fragile to Drug Names in Biomedical
  Benchmarks
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks
Jack Gallifant
Shan Chen
Pedro Moreira
Nikolaj Munch
Mingye Gao
Jackson Pond
Leo Anthony Celi
Hugo J. W. L. Aerts
Thomas Hartvigsen
Danielle S. Bitterman
51
9
0
17 Jun 2024
Are Large Language Models True Healthcare Jacks-of-All-Trades?
  Benchmarking Across Health Professions Beyond Physician Exams
Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams
Zheheng Luo
Chenhan Yuan
Qianqian Xie
Sophia Ananiadou
ELM
AI4MH
LM&MA
49
0
0
17 Jun 2024
TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large
  Language Models
TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models
Ping Yu
Kaitao Song
Fengchen He
Ming Chen
Jianfeng Lu
LM&MA
28
2
0
07 Jun 2024
Performance of large language models in numerical vs. semantic medical
  knowledge: Benchmarking on evidence-based Q&As
Performance of large language models in numerical vs. semantic medical knowledge: Benchmarking on evidence-based Q&As
Eden Avnat
Michal Levy
Daniel Herstain
Elia Yanko
Daniel Ben Joya
...
Joseph Mermelstein
Shahar Ovadia
N. Shomron
V. Shalev
Raja-Elie E. Abdulnour
ELM
LM&MA
AI4MH
39
1
0
06 Jun 2024
A Survey on Medical Large Language Models: Technology, Application,
  Trustworthiness, and Future Directions
A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions
Lei Liu
Xiaoyan Yang
Junchi Lei
Xiaoyang Liu
Yue Shen
...
Peng Wei
Jinjie Gu
Zhixuan Chu
Zhan Qin
Kui Ren
LM&MA
AILaw
46
14
0
06 Jun 2024
TAIA: Large Language Models are Out-of-Distribution Data Learners
TAIA: Large Language Models are Out-of-Distribution Data Learners
Shuyang Jiang
Yusheng Liao
Ya Zhang
Yu Wang
Yanfeng Wang
29
3
0
30 May 2024
Benchmarking Large Language Models on CFLUE -- A Chinese Financial
  Language Understanding Evaluation Dataset
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset
Jie Zhu
Junhui Li
Yalong Wen
Lifan Guo
ELM
ALM
38
5
0
17 May 2024
Cross-Care: Assessing the Healthcare Implications of Pre-training Data
  on Language Model Bias
Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias
Shan Chen
Jack Gallifant
Mingye Gao
Pedro Moreira
Nikolaj Munch
...
Hugo J. W. L. Aerts
Brian Anthony
Leo Anthony Celi
William G. La Cava
Danielle S. Bitterman
43
8
0
09 May 2024
Continual Learning of Large Language Models: A Comprehensive Survey
Continual Learning of Large Language Models: A Comprehensive Survey
Haizhou Shi
Zihao Xu
Hengyi Wang
Weiyi Qin
Wenyuan Wang
Yibin Wang
Zifeng Wang
Sayna Ebrahimi
Hao Wang
CLL
KELM
LRM
59
64
0
25 Apr 2024
MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models
  with Sparse Mixture of Low-Rank Adapter Experts
MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts
Yusheng Liao
Shuyang Jiang
Yu Wang
Yanfeng Wang
MoE
36
5
0
13 Apr 2024
Guiding Clinical Reasoning with Large Language Models via Knowledge
  Seeds
Guiding Clinical Reasoning with Large Language Models via Knowledge Seeds
Jiageng Wu
Xian Wu
Jie Yang
LRM
ELM
41
8
0
11 Mar 2024
Apollo: A Lightweight Multilingual Medical LLM towards Democratizing
  Medical AI to 6B People
Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People
Xidong Wang
Nuo Chen
Junying Chen
Yan Hu
Yidong Wang
Xiangbo Wu
Anningzhe Gao
Xiang Wan
Haizhou Li
Benyou Wang
LM&MA
46
25
0
06 Mar 2024
LLMs with Industrial Lens: Deciphering the Challenges and Prospects -- A
  Survey
LLMs with Industrial Lens: Deciphering the Challenges and Prospects -- A Survey
Ashok Urlana
Charaka Vinayak Kumar
Ajeet Kumar Singh
B. Garlapati
S. Chalamala
Rahul Mishra
35
5
0
22 Feb 2024
RareBench: Can LLMs Serve as Rare Diseases Specialists?
RareBench: Can LLMs Serve as Rare Diseases Specialists?
Xuanzhong Chen
Xiaohao Mao
Qihan Guo
Lun Wang
Shuyang Zhang
Ting Chen
ELM
LM&MA
AI4MH
58
22
0
09 Feb 2024
Generalist embedding models are better at short-context clinical
  semantic search than specialized embedding models
Generalist embedding models are better at short-context clinical semantic search than specialized embedding models
Jean-Baptiste Excoffier
Tom Roehr
Alexei Figueroa
Jens-Michalis Papaioannou
Keno Bressem
Matthieu Ortala
45
4
0
03 Jan 2024
MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large
  Language Models
MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models
Yan Cai
Linlin Wang
Ye Wang
Gerard de Melo
Ya Zhang
Yanfeng Wang
Liang He
AI4MH
ELM
LM&MA
53
17
0
20 Dec 2023
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs
Junying Chen
Xidong Wang
Anningzhe Gao
Feng Jiang
Shunian Chen
...
Chuyi Kong
Jianquan Li
Xiang Wan
Haizhou Li
Benyou Wang
LM&MA
24
60
0
16 Nov 2023
Continuous Training and Fine-tuning for Domain-Specific Language Models
  in Medical Question Answering
Continuous Training and Fine-tuning for Domain-Specific Language Models in Medical Question Answering
Zhen Guo
Yining Hua
LM&MA
CLL
ALM
AI4MH
33
5
0
01 Nov 2023
Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large
  Language Model
Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model
Qichen Ye
Junling Liu
Dading Chong
Peilin Zhou
Yining Hua
...
Meng Cao
Ziming Wang
Xuxin Cheng
Andrew Liu
Zhenhua Guo
AI4MH
LM&MA
ELM
35
20
0
13 Oct 2023
Exploring the Cognitive Knowledge Structure of Large Language Models: An
  Educational Diagnostic Assessment Approach
Exploring the Cognitive Knowledge Structure of Large Language Models: An Educational Diagnostic Assessment Approach
Zheyuan Zhang
Jifan Yu
Juanzi Li
Lei Hou
AI4Ed
29
3
0
12 Oct 2023
Creating Trustworthy LLMs: Dealing with Hallucinations in Healthcare AI
Creating Trustworthy LLMs: Dealing with Hallucinations in Healthcare AI
Muhammad Aurangzeb Ahmad
Ilker Yaramis
Taposh Dutta Roy
LM&MA
HILM
33
34
0
26 Sep 2023
CMB: A Comprehensive Medical Benchmark in Chinese
CMB: A Comprehensive Medical Benchmark in Chinese
Xidong Wang
Guiming Hardy Chen
Dingjie Song
Zhiyi Zhang
Zhihong Chen
...
Feng Jiang
Jianquan Li
Xiang Wan
Benyou Wang
Haizhou Li
LM&MA
ELM
AI4MH
33
80
0
17 Aug 2023
ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs
ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs
Zihao Zhao
Sheng Wang
Jinchen Gu
Yitao Zhu
Lanzhuju Mei
Zixu Zhuang
Zhiming Cui
Qian Wang
Dinggang Shen
LM&MA
37
36
0
25 May 2023
12
Next