ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.00541
44
4

Answering real-world clinical questions using large language model based systems

29 June 2024
Y. Low
Michael L. Jackson
Rebecca J. Hyde
Robert E. Brown
Neil M. Sanghavi
Julian D. Baldwin
C. W. Pike
Jananee Muralidharan
Gavin Hui
Natasha Alexander
Hadeel Hassan
R. Nene
Morgan Pike
Courtney J. Pokrzywa
Shivam Vedak
Adam Paul Yan
Dong-han Yao
A. Zipursky
Christina Dinh
Philip Ballentine
D. Derieg
Vladimir Polony
Rehan N. Chawdry
Jordan Davies
Brigham B. Hyde
N. Shah
S. Gombar
    LM&MA
    ELM
    AI4MH
ArXivPDFHTML
Abstract

Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-based systems in answering 50 clinical questions and had nine independent physicians review the responses for relevance, reliability, and actionability. As it stands, general-purpose LLMs (ChatGPT-4, Claude 3 Opus, Gemini Pro 1.5) rarely produced answers that were deemed relevant and evidence-based (2% - 10%). In contrast, retrieval augmented generation (RAG)-based and agentic LLM systems produced relevant and evidence-based answers for 24% (OpenEvidence) to 58% (ChatRWD) of questions. Only the agentic ChatRWD was able to answer novel questions compared to other LLMs (65% vs. 0-9%). These results suggest that while general-purpose LLMs should not be used as-is, a purpose-built system for evidence summarization based on RAG and one for generating novel evidence working synergistically would improve availability of pertinent evidence for patient care.

View on arXiv
Comments on this paper