ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.15417
39
2

AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

26 January 2025
Junan Zhang
Jing Yang
Zihao Fang
Y. Wang
Zehua Zhang
Zhuo Wang
Fan Fan
Z. Wu
ArXivPDFHTML
Abstract

We introduce AnyEnhance, a unified generative model for voice enhancement that processes both speech and singing voices. Based on a masked generative model, AnyEnhance is capable of handling both speech and singing voices, supporting a wide range of enhancement tasks including denoising, dereverberation, declipping, super-resolution, and target speaker extraction, all simultaneously and without fine-tuning. AnyEnhance introduces a prompt-guidance mechanism for in-context learning, which allows the model to natively accept a reference speaker's timbre. In this way, it could boost enhancement performance when a reference audio is available and enable the target speaker extraction task without altering the underlying architecture. Moreover, we also introduce a self-critic mechanism into the generative process for masked generative models, yielding higher-quality outputs through iterative self-assessment and refinement. Extensive experiments on various enhancement tasks demonstrate AnyEnhance outperforms existing methods in terms of both objective metrics and subjective listening tests. Demo audios are publicly available atthis https URL.

View on arXiv
@article{zhang2025_2501.15417,
  title={ AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement },
  author={ Junan Zhang and Jing Yang and Zihao Fang and Yuancheng Wang and Zehua Zhang and Zhuo Wang and Fan Fan and Zhizheng Wu },
  journal={arXiv preprint arXiv:2501.15417},
  year={ 2025 }
}
Comments on this paper