Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy

24 March 2024

Papers citing "Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy"

7 / 7 papers shown

Title
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction Xueyuan Chen Yuejiao Wang Xixin Wu Disong Wang Zhiyong Wu Xunying Liu Helen M. Meng 42 6 0 31 Jan 2024
StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis Xueyuan Chen Xi Wang Shaofei Zhang Lei He Zhiyong Wu Xixin Wu Helen M. Meng 41 7 0 19 Dec 2023
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text Xianghu Yue Junyi Ao Xiaoxue Gao Haizhou Li SSL 26 8 0 30 Oct 2022
Speaker recognition with two-step multi-modal deep cleansing Ruijie Tao Kong Aik Lee Zhan Shi Haizhou Li NoLa 47 13 0 28 Oct 2022
Self-Supervised Speech Representation Learning: A Review Abdel-rahman Mohamed Hung-yi Lee Lasse Borgholt Jakob Drachmann Havtorn Joakim Edin ... Shang-Wen Li Karen Livescu Lars Maaløe Tara N. Sainath Shinji Watanabe SSL AI4TS 128 349 0 21 May 2022
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency Ruohan Gao Kristen Grauman CVBM 190 198 0 08 Jan 2021
VoxCeleb2: Deep Speaker Recognition Joon Son Chung Arsha Nagrani Andrew Zisserman 224 2,234 0 14 Jun 2018