Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.14766
Cited By
Evaluating Large Language Models for Public Health Classification and Extraction Tasks
20 February 2025
Joshua Harris
Timothy Laurence
Leo Loman
Fan Grayson
Toby Nonnenmacher
Harry Long
Loes WalsGriffith
Amy Douglas
Holly Fountain
Stelios Georgiou
Jo Hardstaff
Kathryn Hopkins
Y-Ling Chi
G. Kuyumdzhieva
Lesley Larkin
Samuel Collins
Hamish Mohammed
Thomas Finnie
Luke Hounsome
Michael Borowitz
Steven Riley
LM&MA
AI4MH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Evaluating Large Language Models for Public Health Classification and Extraction Tasks"
36 / 36 papers shown
Title
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information
Joshua Harris
Fan Grayson
Felix Feldman
Timothy Laurence
Toby Nonnenmacher
...
Leo Loman
Selina Patel
Thomas Finnie
Samuel Collins
Michael Borowitz
AI4MH
LM&MA
ELM
105
0
0
09 May 2025
An empirical study of LLaMA3 quantization: from LLMs to MLLMs
Wei Huang
Xingyu Zheng
Xudong Ma
Haotong Qin
Chengtao Lv
Hong Chen
Jie Luo
Xiaojuan Qi
Xianglong Liu
Michele Magno
MQ
110
42
0
22 Apr 2024
Evaluating Large Language Models for Health-Related Text Classification Tasks with Public Social Media Data
Yuting Guo
Anthony Ovadje
M. Al-garadi
Abeed Sarker
AI4MH
88
9
0
27 Mar 2024
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions
Hanjie Chen
Zhouxiang Fang
Yash Singla
Mark Dredze
ELM
AI4MH
117
43
0
28 Feb 2024
FinBen: A Holistic Financial Benchmark for Large Language Models
Qianqian Xie
Weiguang Han
Zhengyu Chen
Ruoyu Xiang
Xiao Zhang
...
Yanzhao Lai
Hao Wang
Min Peng
Sophia Ananiadou
Jimin Huang
AIFin
90
48
0
20 Feb 2024
RareBench: Can LLMs Serve as Rare Diseases Specialists?
Xuanzhong Chen
Xiaohao Mao
Qihan Guo
Lun Wang
Shuyang Zhang
Ting Chen
ELM
LM&MA
AI4MH
88
25
0
09 Feb 2024
Evaluating and Enhancing Large Language Models Performance in Domain-specific Medicine: Osteoarthritis Management with DocOA
Xi Chen
M. You
Li Wang
Weizhi Liu
Yu Fu
Jie Xu
Shaoting Zhang
Gang Chen
Kang Li
Jian Li
ELM
LM&MA
29
4
0
20 Jan 2024
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
...
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLM
ELM
VLM
266
959
0
27 Nov 2023
LawBench: Benchmarking Legal Knowledge of Large Language Models
Zhiwei Fei
Xiaoyu Shen
D. Zhu
Fengzhe Zhou
Zhuo Han
Songyang Zhang
Kai-xiang Chen
Zongwen Shen
Jidong Ge
ELM
AILaw
110
47
0
28 Sep 2023
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
196
2,322
0
12 Sep 2023
Bias and Fairness in Large Language Models: A Survey
Isabel O. Gallegos
Ryan Rossi
Joe Barrow
Md Mehrab Tanjim
Sungchul Kim
Franck Dernoncourt
Tong Yu
Ruiyi Zhang
Nesreen Ahmed
AILaw
114
594
0
02 Sep 2023
Challenges and Applications of Large Language Models
Jean Kaddour
J. Harris
Maximilian Mozes
Herbie Bradley
Roberta Raileanu
R. McHardy
UQCV
ALM
AAML
80
313
0
19 Jul 2023
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
156
1,723
0
06 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
441
4,444
0
09 Jun 2023
Interpretable Medical Diagnostics with Structured Data Extraction by Large Language Models
Aleksa Bisercic
Mladen Nikolic
M. Schaar
Boris Delibasic
Pietro Lio
Andrija Petrović
80
17
0
08 Jun 2023
Can LLMs like GPT-4 outperform traditional AI tools in dementia diagnosis? Maybe, but not today
Zhuo Wang
R. Li
Bowen Dong
Jie Wang
Xiuxing Li
...
C. Mao
Wei Zhang
L. Dong
Jing Gao
Jianyong Wang
LM&MA
ELM
AI4MH
72
20
0
02 Jun 2023
What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks
Taicheng Guo
Kehan Guo
B. Nan
Zhengwen Liang
Zhichun Guo
Nitesh Chawla
Olaf Wiest
Xiangliang Zhang
ELM
139
141
0
27 May 2023
BioDEX: Large-Scale Biomedical Adverse Drug Event Extraction for Real-World Pharmacovigilance
Karel DÓosterlinck
François Remy
Johannes Deleu
Thomas Demeester
Chris Develder
Klim Zaporojets
Aneiss Ghodsi
Simon Ellershaw
Jack R. Collins
Christopher Potts
82
11
0
22 May 2023
Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success)
Chantal Shaib
Millicent Li
Sebastian Antony Joseph
Iain J. Marshall
Junyi Jessy Li
Byron C. Wallace
LM&MA
ELM
68
67
0
10 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
280
631
0
03 May 2023
GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models
Tyna Eloundou
Sam Manning
Pamela Mishkin
Daniel Rock
ELM
66
401
0
17 Mar 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.5K
14,761
0
15 Mar 2023
Can GPT-3 Perform Statutory Reasoning?
Andrew Blair-Stanek
Nils Holzenberger
Benjamin Van Durme
ELM
LRM
109
100
0
13 Feb 2023
GPT as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities
Jillian Bommarito
M. Bommarito
Daniel Martin Katz
Jessica Katz
ELM
52
54
0
11 Jan 2023
GPT Takes the Bar Exam
M. Bommarito
Daniel Martin Katz
ELM
77
155
0
29 Dec 2022
Legal Prompting: Teaching a Language Model to Think Like a Lawyer
Fang Yu
Lee Quartey
Frank Schilder
ELM
LRM
42
68
0
02 Dec 2022
Galactica: A Large Language Model for Science
Ross Taylor
Marcin Kardas
Guillem Cucurull
Thomas Scialom
Anthony Hartshorn
Elvis Saravia
Andrew Poulton
Viktor Kerkez
Robert Stojnic
ELM
ReLM
117
778
0
16 Nov 2022
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
231
3,158
0
20 Oct 2022
News Summarization and Evaluation in the Era of GPT-3
Tanya Goyal
Junyi Jessy Li
Greg Durrett
ELM
110
411
0
26 Sep 2022
Can large language models reason about medical questions?
Valentin Liévin
C. Hother
Andreas Geert Motzfeldt
Ole Winther
ELM
LM&MA
AI4MH
LRM
101
314
0
17 Jul 2022
Large Language Models are Few-Shot Clinical Information Extractors
Monica Agrawal
S. Hegselmann
Hunter Lang
Yoon Kim
David Sontag
BDL
LM&MA
241
347
0
25 May 2022
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
270
425
0
15 Oct 2021
The RareDis corpus: a corpus annotated with rare diseases, their signs and symptoms
Claudia Martínez-de Miguel
Isabel Segura-Bedmar
E. Chacón-Solano
S. Guerrero-Aspizua
25
20
0
02 Aug 2021
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
187
4,572
0
07 Sep 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
490
20,342
0
23 Oct 2019
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
398
913
0
13 Sep 2019
1