ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.13596
12
0
v1v2 (latest)

Qwen vs. Gemma Integration with Whisper: A Comparative Study in Multilingual SpeechLLM Systems

16 June 2025
Tuan Nguyen
Long-Vu Hoang
Huy-Dat Tran
ArXiv (abs)PDFHTML
Main:3 Pages
1 Figures
Bibliography:1 Pages
2 Tables
Abstract

This paper presents our system for the MLC-SLM Challenge 2025, focusing on multilingual speech recognition and language modeling with large language models (LLMs). Our approach combines a fine-tuned Whisper-large-v3 encoder with efficient projector architectures and various decoder configurations. We employ a three-stage training methodology that progressively optimizes the encoder, projector, and LLM components. Our system achieves competitive performance with a private test average WER/CER result of 16.63% using the Gemma3-12B and 18.6% using the Qwen2.5-7B as decoder-only language model.

View on arXiv
@article{nguyen2025_2506.13596,
  title={ Qwen vs. Gemma Integration with Whisper: A Comparative Study in Multilingual SpeechLLM Systems },
  author={ Tuan Nguyen and Long-Vu Hoang and Huy-Dat Tran },
  journal={arXiv preprint arXiv:2506.13596},
  year={ 2025 }
}
Comments on this paper