Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.13375
Cited By
v1
v2 (latest)
Capabilities of GPT-4 on Medical Challenge Problems
20 March 2023
Harsha Nori
Nicholas King
S. McKinney
Dean Carignan
Eric Horvitz
LM&MA
ELM
AI4MH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Capabilities of GPT-4 on Medical Challenge Problems"
47 / 47 papers shown
Title
AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation
Xiechi Zhang
Zetian Ouyang
Linlin Wang
Gerard de Melo
Zhu Cao
Xiaoling Wang
Ya Zhang
Yanfeng Wang
Liang He
LM&MA
ELM
107
0
0
17 May 2025
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
Xiangru Tang
Daniel Shao
Jiwoong Sohn
Jiapeng Chen
Jiayi Zhang
...
Yilun Zhao
Chenglin Wu
Wenqi Shi
Arman Cohan
Mark B. Gerstein
AI4MH
LRM
ELM
LM&MA
123
10
0
10 Mar 2025
Can Frontier LLMs Replace Annotators in Biomedical Text Mining? Analyzing Challenges and Exploring Solutions
Yichong Zhao
Susumu Goto
94
0
0
05 Mar 2025
Repurposing the scientific literature with vision-language models
Anton Alyakin
Jaden Stryker
Daniel Alber
Karl L. Sangwon
Brandon Duderstadt
...
Laura Snyder
Eric Leuthardt
Douglas Kondziolka
E. Oermann
Eric Karl Oermann
167
0
0
26 Feb 2025
Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models
Ran Xu
Hejie Cui
Yue Yu
Xuan Kan
Wenqi Shi
Yuchen Zhuang
Wei Jin
Joyce C. Ho
Carl Yang
158
17
0
28 Jan 2025
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Kai He
Rui Mao
Qika Lin
Yucheng Ruan
Xiang Lan
Mengling Feng
Min Zhang
LM&MA
AILaw
226
175
0
28 Jan 2025
ASTRID -- An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems
Mohita Chowdhury
Yajie Vera He
Aisling Higham
Ernest Lim
155
1
0
14 Jan 2025
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Hanguang Xiao
Feizhong Zhou
Xianglong Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILaw
LM&MA
LRM
123
29
0
31 Dec 2024
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Khaoula Chehbouni
Jonathan Colaço-Carr
Yash More
Jackie CK Cheung
G. Farnadi
166
1
0
12 Nov 2024
Exploring the Role of Reasoning Structures for Constructing Proofs in Multi-Step Natural Language Reasoning with Large Language Models
Zióu Zheng
Christopher Malon
Martin Renqiang Min
Xiaodan Zhu
LRM
334
0
0
11 Oct 2024
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
Cheng-rong Li
May Fung
Qingyun Wang
Chi Han
Manling Li
Jindong Wang
Heng Ji
AI4MH
419
0
0
09 Oct 2024
CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation
Han He
Qianchu Liu
Lei Xu
Chaitanya P. Shivade
Yi Zhang
S. Srinivasan
Katrin Kirchhoff
93
1
0
03 Oct 2024
HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations
Ziyu Wang
Hao Li
Di Huang
Amir M. Rahmani
Chae-Won Shin
Amir M. Rahmani
LM&MA
105
10
0
28 Sep 2024
Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective
Yotam Wolf
Binyamin Rothberg
Dorin Shteyman
Amnon Shashua
86
0
0
26 Sep 2024
Evaluating Transparent Reasoning in Large Language Models for Accountable Critical Tasks
Bowen Wang
Jiuyang Chang
Yiming Qian
Guoxin Chen
LRM
LM&MA
ELM
103
6
0
04 Aug 2024
OLMES: A Standard for Language Model Evaluations
Yuling Gu
Oyvind Tafjord
Bailey Kuehl
Dany Haddad
Jesse Dodge
Hannaneh Hajishirzi
ELM
105
20
0
12 Jun 2024
ALCM: Autonomous LLM-Augmented Causal Discovery Framework
Elahe Khatibi
Mahyar Abbasian
Zhongqi Yang
Iman Azimi
Amir M. Rahmani
114
15
0
02 May 2024
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions
Hanjie Chen
Zhouxiang Fang
Yash Singla
Mark Dredze
ELM
AI4MH
117
43
0
28 Feb 2024
Tradeoffs Between Alignment and Helpfulness in Language Models with Steering Methods
Yotam Wolf
Noam Wies
Dorin Shteyman
Binyamin Rothberg
Yoav Levine
Amnon Shashua
LLMSV
117
13
0
29 Jan 2024
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo
Hossein Rahmani
Jing Liu
226
31
0
27 Aug 2023
Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models
Pengfei Li
Jianyi Yang
M. A. Islam
Shaolei Ren
188
138
0
06 Apr 2023
Large Language Models Encode Clinical Knowledge
K. Singhal
Shekoofeh Azizi
T. Tu
S. S. Mahdavi
Jason W. Wei
...
A. Rajkomar
Joelle Barral
Christopher Semturs
Alan Karthikesalingam
Vivek Natarajan
LM&MA
ELM
AI4MH
158
2,381
0
26 Dec 2022
Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming
Hussein Mozannar
Gagan Bansal
Adam Fourney
Eric Horvitz
130
117
0
25 Oct 2022
Can large language models reason about medical questions?
Valentin Liévin
C. Hother
Andreas Geert Motzfeldt
Ole Winther
ELM
LM&MA
AI4MH
LRM
101
314
0
17 Jul 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
532
4,077
0
24 May 2022
Who Goes First? Influences of Human-AI Workflow on Decision Making in Clinical Imaging
Riccardo Fogliato
Shreya Chappidi
M. Lungren
Michael Fitzke
Mark Parkinson
Diane U Wilson
Paul Fisher
Eric Horvitz
K. Inkpen
Besmira Nushi
75
71
0
19 May 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
256
2,623
0
12 Apr 2022
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
Ankit Pal
Logesh Kumar Umapathi
Malaikannan Sankarasubbu
ELM
LM&MA
89
348
0
27 Mar 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
552
3,737
0
21 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
888
13,207
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
850
9,714
0
28 Jan 2022
WebGPT: Browser-assisted question-answering with human feedback
Reiichiro Nakano
Jacob Hilton
S. Balaji
Jeff Wu
Ouyang Long
...
Gretchen Krueger
Kevin Button
Matthew Knight
B. Chess
John Schulman
ALM
RALM
196
1,294
0
17 Dec 2021
A Universal Law of Robustness via Isoperimetry
Sébastien Bubeck
Mark Sellke
55
218
0
26 May 2021
What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams
Di Jin
Eileen Pan
Nassim Oufattole
W. Weng
Hanyi Fang
Peter Szolovits
FaML
ELM
LM&MA
119
808
0
28 Sep 2020
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
187
4,572
0
07 Sep 2020
An Empirical Analysis of Backward Compatibility in Machine Learning Systems
Megha Srivastava
Besmira Nushi
Ece Kamar
S. Shah
Eric Horvitz
AAML
88
47
0
11 Aug 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
889
42,463
0
28 May 2020
Learning to Complement Humans
Bryan Wilder
Eric Horvitz
Ece Kamar
171
168
0
01 May 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
639
4,921
0
23 Jan 2020
Measurement and Fairness
Abigail Z. Jacobs
Hanna M. Wallach
82
400
0
11 Dec 2019
InterpretML: A Unified Framework for Machine Learning Interpretability
Harsha Nori
Samuel Jenkins
Paul Koch
R. Caruana
AI4CE
164
489
0
19 Sep 2019
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
398
913
0
13 Sep 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,229
0
11 Oct 2018
A Reductions Approach to Fair Classification
Alekh Agarwal
A. Beygelzimer
Miroslav Dudík
John Langford
Hanna M. Wallach
FaML
230
1,105
0
06 Mar 2018
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning
Pranav Rajpurkar
Jeremy Irvin
Kaylie Zhu
Brandon Yang
Hershel Mehta
...
Aarti Bagul
C. Langlotz
K. Shpanskaya
M. Lungren
A. Ng
LM&MA
99
2,712
0
14 Nov 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
803
132,454
0
12 Jun 2017
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
218
3,377
0
12 Jun 2017
1