ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.15777
  4. Cited By
A Comprehensive Survey on Evaluating Large Language Model Applications
  in the Medical Industry

A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry

24 April 2024
Yining Huang
Keke Tang
Meilian Chen
Boyuan Wang
    ELM
    LM&MA
ArXivPDFHTML

Papers citing "A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry"

17 / 17 papers shown
Title
CMQCIC-Bench: A Chinese Benchmark for Evaluating Large Language Models in Medical Quality Control Indicator Calculation
CMQCIC-Bench: A Chinese Benchmark for Evaluating Large Language Models in Medical Quality Control Indicator Calculation
Guangya Yu
Yanhao Li
Zongying Jiang
Yuxiong Jin
Li Dai
...
Weiyan Zhang
Yongqi Fan
Qi Ye
Jingping Liu
Tong Ruan
LM&MA
ELM
77
0
0
17 Feb 2025
TestAgent: A Framework for Domain-Adaptive Evaluation of LLMs via Dynamic Benchmark Construction and Exploratory Interaction
TestAgent: A Framework for Domain-Adaptive Evaluation of LLMs via Dynamic Benchmark Construction and Exploratory Interaction
Wanying Wang
Zeyu Ma
Pengfei Liu
Mingang Chen
LLMAG
47
1
0
15 Oct 2024
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey
  on How to Make your LLMs use External Data More Wisely
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely
Siyun Zhao
Yuqing Yang
Zilong Wang
Zhiyuan He
Luna Qiu
Lili Qiu
SyDa
RALM
3DV
44
34
0
23 Sep 2024
Gemma: Open Models Based on Gemini Research and Technology
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team
Gemma Team Thomas Mesnard
Cassidy Hardin
Robert Dadashi
Surya Bhupatiraju
...
Armand Joulin
Noah Fiedel
Evan Senter
Alek Andreev
Kathleen Kenealy
VLM
LLMAG
131
431
0
13 Mar 2024
LongHealth: A Question Answering Benchmark with Long Clinical Documents
LongHealth: A Question Answering Benchmark with Long Clinical Documents
Lisa Christine Adams
Felix Busch
T. Han
Jean-Baptiste Excoffier
Matthieu Ortala
Alexander Loser
Hugo J. W. L. Aerts
Jakob Nikolas Kather
Daniel Truhn
Keno Bressem
ELM
LM&MA
AI4MH
39
10
0
25 Jan 2024
MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large
  Language Models
MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models
Yan Cai
Linlin Wang
Ye Wang
Gerard de Melo
Ya-Qin Zhang
Yanfeng Wang
Liang He
AI4MH
ELM
LM&MA
45
17
0
20 Dec 2023
Evaluating Superhuman Models with Consistency Checks
Evaluating Superhuman Models with Consistency Checks
Lukas Fluri
Daniel Paleka
Florian Tramèr
ELM
42
42
0
16 Jun 2023
A Study of Generative Large Language Model for Medical Research and
  Healthcare
A Study of Generative Large Language Model for Medical Research and Healthcare
C.A.I. Peng
Xi Yang
Aokun Chen
Kaleb E. Smith
Nima M. Pournejatian
...
W. Hogan
E. Shenkman
Yi Guo
Jiang Bian
Yonghui Wu
LM&MA
ELM
AI4MH
152
242
0
22 May 2023
Automated Paper Screening for Clinical Reviews Using Large Language
  Models
Automated Paper Screening for Clinical Reviews Using Large Language Models
Eddie Guo
Mehul Gupta
Jiawen Deng
Ye-Jean Park
M. Paget
C. Naugler
LM&MA
ELM
27
74
0
01 May 2023
Beyond Classification: Financial Reasoning in State-of-the-Art Language
  Models
Beyond Classification: Financial Reasoning in State-of-the-Art Language Models
Guijin Son
Han-Na Jung
Moonjeong Hahm
Keonju Na
Sol Jin
AIFin
LRM
50
18
0
30 Apr 2023
Readability Controllable Biomedical Document Summarization
Readability Controllable Biomedical Document Summarization
Zheheng Luo
Qianqian Xie
Sophia Ananiadou
40
36
0
10 Oct 2022
Toxicity Detection with Generative Prompt-based Inference
Toxicity Detection with Generative Prompt-based Inference
Yau-Shian Wang
Y. Chang
85
35
0
24 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document
  Summarization
PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
Wen Xiao
Iz Beltagy
Giuseppe Carenini
Arman Cohan
CVBM
83
114
0
16 Oct 2021
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information
  Retrieval Models
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
Nandan Thakur
Nils Reimers
Andreas Rucklé
Abhishek Srivastava
Iryna Gurevych
VLM
231
966
0
17 Apr 2021
PubMedQA: A Dataset for Biomedical Research Question Answering
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
210
812
0
13 Sep 2019
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation
  Models
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models
Daniil Polykovskiy
Alexander Zhebrak
Benjamín Sánchez-Lengeling
Sergey Golovanov
Oktai Tatanov
...
Simon Johansson
Hongming Chen
Sergey I. Nikolenko
Alán Aspuru-Guzik
Alex Zhavoronkov
ELM
194
633
0
29 Nov 2018
1