ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.02339
50
0

Enhancing Lyrics Transcription on Music Mixtures with Consistency Loss

3 June 2025
Jiawen Huang
Felipe Sousa
Emir Demirel
Emmanouil Benetos
Igor Gadelha
ArXiv (abs)PDFHTML
Main:4 Pages
1 Figures
Bibliography:1 Pages
3 Tables
Abstract

Automatic Lyrics Transcription (ALT) aims to recognize lyrics from singing voices, similar to Automatic Speech Recognition (ASR) for spoken language, but faces added complexity due to domain-specific properties of the singing voice. While foundation ASR models show robustness in various speech tasks, their performance degrades on singing voice, especially in the presence of musical accompaniment. This work focuses on this performance gap and explores Low-Rank Adaptation (LoRA) for ALT, investigating both single-domain and dual-domain fine-tuning strategies. We propose using a consistency loss to better align vocal and mixture encoder representations, improving transcription on mixture without relying on singing voice separation. Our results show that while naïve dual-domain fine-tuning underperforms, structured training with consistency loss yields modest but consistent gains, demonstrating the potential of adapting ASR foundation models for music.

View on arXiv
@article{huang2025_2506.02339,
  title={ Enhancing Lyrics Transcription on Music Mixtures with Consistency Loss },
  author={ Jiawen Huang and Felipe Sousa and Emir Demirel and Emmanouil Benetos and Igor Gadelha },
  journal={arXiv preprint arXiv:2506.02339},
  year={ 2025 }
}
Comments on this paper