ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.11887
  4. Cited By
AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation

AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation

17 May 2025
Xiechi Zhang
Zetian Ouyang
Linlin Wang
Gerard de Melo
Zhu Cao
Xiaoling Wang
Ya Zhang
Yanfeng Wang
Liang He
    LM&MAELM
ArXiv (abs)PDFHTML

Papers citing "AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation"

18 / 18 papers shown
Title
MoCa: Measuring Human-Language Model Alignment on Causal and Moral
  Judgment Tasks
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
Allen Nie
Yuhui Zhang
Atharva Amdekar
Chris Piech
Tatsunori Hashimoto
Tobias Gerstenberg
66
40
0
30 Oct 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work
  Partitioning
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao
LRM
115
1,326
0
17 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
408
4,422
0
09 Jun 2023
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning
  Optimization
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
Yidong Wang
Zhuohao Yu
Zhengran Zeng
Linyi Yang
Cunxiang Wang
...
Jindong Wang
Xingxu Xie
Wei Ye
Shi-Bo Zhang
Yue Zhang
ALMELM
107
249
0
08 Jun 2023
Introspective Tips: Large Language Model for In-Context Decision Making
Introspective Tips: Large Language Model for In-Context Decision Making
Liting Chen
Lu Wang
Hang Dong
Yali Du
Jie Yan
...
Pu Zhao
Si Qin
Saravan Rajmohan
Qingwei Lin
Dongmei Zhang
LLMAGLRM
86
28
0
19 May 2023
MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large
  Language Models in Medicine
MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine
Jie Xu
Lu Lu
Sen Yang
Bilin Liang
Xinwei Peng
...
Lingrui Yang
Huan-Zhi Song
Kang Li
Xin Sun
Shaoting Zhang
LM&MAAI4MH
36
8
0
12 May 2023
DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task
DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task
Honglin Xiong
Sheng Wang
Yitao Zhu
Zihao Zhao
Yuxiao Liu
Linlin Huang
Qian Wang
Dinggang Shen
LM&MAAI4MH
81
173
0
03 Apr 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELMALMLM&MA
181
1,208
0
29 Mar 2023
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model
  Meta-AI (LLaMA) Using Medical Domain Knowledge
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge
Yunxiang Li
Zihan Li
Kai Zhang
Ruilong Dan
Steven Jiang
You Zhang
LM&MAAI4MH
188
418
0
24 Mar 2023
Capabilities of GPT-4 on Medical Challenge Problems
Capabilities of GPT-4 on Medical Challenge Problems
Harsha Nori
Nicholas King
S. McKinney
Dean Carignan
Eric Horvitz
LM&MAELMAI4MH
134
809
0
20 Mar 2023
Is ChatGPT a Good NLG Evaluator? A Preliminary Study
Is ChatGPT a Good NLG Evaluator? A Preliminary Study
Jiaan Wang
Yunlong Liang
Fandong Meng
Zengkui Sun
Haoxiang Shi
Zhixu Li
Jinan Xu
Jianfeng Qu
Jie Zhou
LM&MAELMALMAI4MH
131
470
0
07 Mar 2023
Large Language Models Encode Clinical Knowledge
Large Language Models Encode Clinical Knowledge
K. Singhal
Shekoofeh Azizi
T. Tu
S. S. Mahdavi
Jason W. Wei
...
A. Rajkomar
Joelle Barral
Christopher Semturs
Alan Karthikesalingam
Vivek Natarajan
LM&MAELMAI4MH
152
2,361
0
26 Dec 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
252
2,272
0
27 May 2022
BARTScore: Evaluating Generated Text as Text Generation
BARTScore: Evaluating Generated Text as Text Generation
Weizhe Yuan
Graham Neubig
Pengfei Liu
115
849
0
22 Jun 2021
SimCSE: Simple Contrastive Learning of Sentence Embeddings
SimCSE: Simple Contrastive Learning of Sentence Embeddings
Tianyu Gao
Xingcheng Yao
Danqi Chen
AILawSSL
276
3,411
0
18 Apr 2021
What Disease does this Patient Have? A Large-scale Open Domain Question
  Answering Dataset from Medical Exams
What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams
Di Jin
Eileen Pan
Nassim Oufattole
W. Weng
Hanyi Fang
Peter Szolovits
FaMLELMLM&MA
113
807
0
28 Sep 2020
BERTScore: Evaluating Text Generation with BERT
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
346
5,860
0
21 Apr 2019
Toward a Formal Model of Cognitive Synergy
Toward a Formal Model of Cognitive Synergy
B. Goertzel
34
13
0
13 Mar 2017
1