HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

24 March 2022

Pavel Andreev

Aibek Alanov

Oleg Ivanov

Dmitry Vetrov

ArXiv (abs)PDF HTML Github (178★)

Main:4 Pages

9 Figures

Bibliography:2 Pages

6 Tables

Appendix:7 Pages

Abstract

Generative adversarial networks have recently demonstrated outstanding performance in neural vocoding outperforming best autoregressive and flow-based models. In this paper, we show that this success can be extended to other tasks of conditional audio generation. In particular, building upon HiFi vocoders, we propose a novel HiFi++ general framework for neural vocoding, bandwidth extension, and speech enhancement. We show that with the improved generator architecture and simplified multi-discriminator training, HiFi++ performs on par with the state-of-the-art in these tasks while spending significantly less memory and computational resources. The effectiveness of our approach is validated through a series of extensive experiments.

View on arXiv

Comments on this paper