ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.07617
10
0

Vuyko Mistral: Adapting LLMs for Low-Resource Dialectal Translation

9 June 2025
Roman Kyslyi
Yuliia Maksymiuk
Ihor Pysmennyi
ArXiv (abs)PDFHTML
Main:8 Pages
3 Figures
Bibliography:2 Pages
2 Tables
Abstract

In this paper we introduce the first effort to adapt large language models (LLMs) to the Ukrainian dialect (in our case Hutsul), a low-resource and morphologically complex dialect spoken in the Carpathian Highlands. We created a parallel corpus of 9852 dialect-to-standard Ukrainian sentence pairs and a dictionary of 7320 dialectal word mappings. We also addressed data shortage by proposing an advanced Retrieval-Augmented Generation (RAG) pipeline to generate synthetic parallel translation pairs, expanding the corpus with 52142 examples. We have fine-tuned multiple open-source LLMs using LoRA and evaluated them on a standard-to-dialect translation task, also comparing with few-shot GPT-4o translation. In the absence of human annotators, we adopt a multi-metric evaluation strategy combining BLEU, chrF++, TER, and LLM-based judgment (GPT-4o). The results show that even small(7B) finetuned models outperform zero-shot baselines such as GPT-4o across both automatic and LLM-evaluated metrics. All data, models, and code are publicly released at:this https URL

View on arXiv
@article{kyslyi2025_2506.07617,
  title={ Vuyko Mistral: Adapting LLMs for Low-Resource Dialectal Translation },
  author={ Roman Kyslyi and Yuliia Maksymiuk and Ihor Pysmennyi },
  journal={arXiv preprint arXiv:2506.07617},
  year={ 2025 }
}
Comments on this paper