Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.15777
Cited By
A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry
24 April 2024
Yining Huang
Keke Tang
Meilian Chen
Boyuan Wang
ELM
LM&MA
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry"
17 / 17 papers shown
Title
CMQCIC-Bench: A Chinese Benchmark for Evaluating Large Language Models in Medical Quality Control Indicator Calculation
Guangya Yu
Yanhao Li
Zongying Jiang
Yuxiong Jin
Li Dai
...
Weiyan Zhang
Yongqi Fan
Qi Ye
Jingping Liu
Tong Ruan
LM&MA
ELM
77
0
0
17 Feb 2025
TestAgent: A Framework for Domain-Adaptive Evaluation of LLMs via Dynamic Benchmark Construction and Exploratory Interaction
Wanying Wang
Zeyu Ma
Pengfei Liu
Mingang Chen
LLMAG
47
1
0
15 Oct 2024
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely
Siyun Zhao
Yuqing Yang
Zilong Wang
Zhiyuan He
Luna Qiu
Lili Qiu
SyDa
RALM
3DV
44
35
0
23 Sep 2024
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team
Gemma Team Thomas Mesnard
Cassidy Hardin
Robert Dadashi
Surya Bhupatiraju
...
Armand Joulin
Noah Fiedel
Evan Senter
Alek Andreev
Kathleen Kenealy
VLM
LLMAG
131
431
0
13 Mar 2024
LongHealth: A Question Answering Benchmark with Long Clinical Documents
Lisa Christine Adams
Felix Busch
T. Han
Jean-Baptiste Excoffier
Matthieu Ortala
Alexander Loser
Hugo J. W. L. Aerts
Jakob Nikolas Kather
Daniel Truhn
Keno Bressem
ELM
LM&MA
AI4MH
39
10
0
25 Jan 2024
MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models
Yan Cai
Linlin Wang
Ye Wang
Gerard de Melo
Ya-Qin Zhang
Yanfeng Wang
Liang He
AI4MH
ELM
LM&MA
45
17
0
20 Dec 2023
Evaluating Superhuman Models with Consistency Checks
Lukas Fluri
Daniel Paleka
Florian Tramèr
ELM
42
42
0
16 Jun 2023
A Study of Generative Large Language Model for Medical Research and Healthcare
C.A.I. Peng
Xi Yang
Aokun Chen
Kaleb E. Smith
Nima M. Pournejatian
...
W. Hogan
E. Shenkman
Yi Guo
Jiang Bian
Yonghui Wu
LM&MA
ELM
AI4MH
152
244
0
22 May 2023
Automated Paper Screening for Clinical Reviews Using Large Language Models
Eddie Guo
Mehul Gupta
Jiawen Deng
Ye-Jean Park
M. Paget
C. Naugler
LM&MA
ELM
29
74
0
01 May 2023
Beyond Classification: Financial Reasoning in State-of-the-Art Language Models
Guijin Son
Han-Na Jung
Moonjeong Hahm
Keonju Na
Sol Jin
AIFin
LRM
50
18
0
30 Apr 2023
Readability Controllable Biomedical Document Summarization
Zheheng Luo
Qianqian Xie
Sophia Ananiadou
42
36
0
10 Oct 2022
Toxicity Detection with Generative Prompt-based Inference
Yau-Shian Wang
Y. Chang
87
35
0
24 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
Wen Xiao
Iz Beltagy
Giuseppe Carenini
Arman Cohan
CVBM
83
114
0
16 Oct 2021
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
Nandan Thakur
Nils Reimers
Andreas Rucklé
Abhishek Srivastava
Iryna Gurevych
VLM
231
971
0
17 Apr 2021
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
213
812
0
13 Sep 2019
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models
Daniil Polykovskiy
Alexander Zhebrak
Benjamín Sánchez-Lengeling
Sergey Golovanov
Oktai Tatanov
...
Simon Johansson
Hongming Chen
Sergey I. Nikolenko
Alán Aspuru-Guzik
Alex Zhavoronkov
ELM
194
633
0
29 Nov 2018
1