Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.08775
Cited By
HealthBench: Evaluating Large Language Models Towards Improved Human Health
13 May 2025
Rahul Arora
Jason W. Wei
Rebecca Soskin Hicks
Preston Bowman
Joaquin Quiñonero Candela
Foivos Tsimpourlas
Michael Sharman
Meghan Shah
Andrea Vallone
Alex Beutel
Johannes Heidecke
K. Singhal
LM&MA
AI4MH
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HealthBench: Evaluating Large Language Models Towards Improved Human Health"
10 / 10 papers shown
Title
Self-GIVE: Associative Thinking from Limited Structured Knowledge for Enhanced Large Language Model Reasoning
Jiashu He
Jinxuan Fan
Bowen Jiang
Ignacio Houine
Dan Roth
Alejandro Ribeiro
ReLM
RALM
LRM
98
2
0
21 May 2025
PaperBench: Evaluating AI's Ability to Replicate AI Research
Giulio Starace
Oliver Jaffe
Dane Sherburn
James Aung
Jun Shern Chan
...
Benjamin Kinsella
Wyatt Thompson
Johannes Heidecke
Amelia Glaese
Tejal Patwardhan
ALM
ELM
930
23
0
02 Apr 2025
WildChat: 1M ChatGPT Interaction Logs in the Wild
Wenting Zhao
Xiang Ren
Jack Hessel
Claire Cardie
Yejin Choi
Yuntian Deng
88
230
0
02 May 2024
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
Scott L. Fleming
Alejandro Lozano
W. Haberkorn
Jenelle A. Jindal
E. Reis
...
Jonathan Chen
Keith Morse
Emma Brunskill
Jason Alan Fries
N. Shah
LM&MA
67
60
0
27 Aug 2023
Towards Generalist Biomedical AI
Tao Tu
Shekoofeh Azizi
Danny Driess
M. Schaekermann
Mohamed Amin
...
Yossi Matias
K. Singhal
Peter R. Florence
Alan Karthikesalingam
Vivek Natarajan
LM&MA
MedIm
AI4MH
103
275
0
26 Jul 2023
Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare delivery
Debadutta Dash
Rahul Thapa
Juan M. Banda
Akshay Swaminathan
Morgan Cheatham
...
Garret K. Morris
H. Magon
M. Lungren
Eric Horvitz
N. Shah
ELM
LM&MA
AI4MH
121
52
0
26 Apr 2023
Capabilities of GPT-4 on Medical Challenge Problems
Harsha Nori
Nicholas King
S. McKinney
Dean Carignan
Eric Horvitz
LM&MA
ELM
AI4MH
149
811
0
20 Mar 2023
Large Language Models Encode Clinical Knowledge
K. Singhal
Shekoofeh Azizi
T. Tu
S. S. Mahdavi
Jason W. Wei
...
A. Rajkomar
Joelle Barral
Christopher Semturs
Alan Karthikesalingam
Vivek Natarajan
LM&MA
ELM
AI4MH
167
2,381
0
26 Dec 2022
RadGraph: Extracting Clinical Entities and Relations from Radiology Reports
Saahil Jain
Ashwin Agrawal
A. Saporta
Steven QH Truong
D. Duong
...
Yuhao Zhang
M. Lungren
A. Ng
C. Langlotz
Pranav Rajpurkar
MedIm
96
213
0
28 Jun 2021
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
401
914
0
13 Sep 2019
1