ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16000
88
0

Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model

21 May 2025
Mehrdad Ghassabi
Pedram Rostami
Hamidreza Baradaran Kashani
Amirhossein Poursina
Zahra Kazemi
Milad Tavakoli
    LM&MA
ArXivPDFHTML
Abstract

The rapid advancement of language models has demonstrated the potential of artificial intelligence in the healthcare industry. However, small language models struggle with specialized domains in low-resource languages like Persian. While numerous medical-domain websites exist in Persian, no curated dataset or corpus has been available making ours the first of its kind. This study explores the enhancement of medical knowledge in a small language model by leveraging accessible online data, including a crawled corpus from medical magazines and a dataset of real doctor-patient QA pairs. We fine-tuned a baseline model using our curated data to improve its medical knowledge. Benchmark evaluations demonstrate that the fine-tuned model achieves improved accuracy in medical question answering and provides better responses compared to its baseline. This work highlights the potential of leveraging open-access online data to enrich small language models in medical fields, providing a novel solution for Persian medical AI applications suitable for resource-constrained environments.

View on arXiv
@article{ghassabi2025_2505.16000,
  title={ Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model },
  author={ Mehrdad Ghassabi and Pedram Rostami and Hamidreza Baradaran Kashani and Amirhossein Poursina and Zahra Kazemi and Milad Tavakoli },
  journal={arXiv preprint arXiv:2505.16000},
  year={ 2025 }
}
Comments on this paper