ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.16689
37
0

WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching

20 March 2025
Tianze Luo
Xingchen Miao
Wenbo Duan
    DiffM
ArXivPDFHTML
Abstract

Flow matching offers a robust and stable approach to training diffusion models. However, directly applying flow matching to neural vocoders can result in subpar audio quality. In this work, we present WaveFM, a reparameterized flow matching model for mel-spectrogram conditioned speech synthesis, designed to enhance both sample quality and generation speed for diffusion vocoders. Since mel-spectrograms represent the energy distribution of waveforms, WaveFM adopts a mel-conditioned prior distribution instead of a standard Gaussian prior to minimize unnecessary transportation costs during synthesis. Moreover, while most diffusion vocoders rely on a single loss function, we argue that incorporating auxiliary losses, including a refined multi-resolution STFT loss, can further improve audio quality. To speed up inference without degrading sample quality significantly, we introduce a tailored consistency distillation method for WaveFM. Experiment results demonstrate that our model achieves superior performance in both quality and efficiency compared to previous diffusion vocoders, while enabling waveform generation in a single inference step.

View on arXiv
@article{luo2025_2503.16689,
  title={ WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching },
  author={ Tianze Luo and Xingchen Miao and Wenbo Duan },
  journal={arXiv preprint arXiv:2503.16689},
  year={ 2025 }
}
Comments on this paper