ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.16078
  4. Cited By
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover
  Strategy

Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy

24 March 2024
Wenxuan Wu
Xueyuan Chen
Xixin Wu
Haizhou Li
Helen M. Meng
ArXivPDFHTML

Papers citing "Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy"

7 / 7 papers shown
Title
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for
  Multi-Modal Dysarthric Speech Reconstruction
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction
Xueyuan Chen
Yuejiao Wang
Xixin Wu
Disong Wang
Zhiyong Wu
Xunying Liu
Helen M. Meng
42
6
0
31 Jan 2024
StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based
  Pre-training for Expressive Audiobook Speech Synthesis
StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis
Xueyuan Chen
Xi Wang
Shaofei Zhang
Lei He
Zhiyong Wu
Xixin Wu
Helen M. Meng
41
7
0
19 Dec 2023
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired
  Speech and Text
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text
Xianghu Yue
Junyi Ao
Xiaoxue Gao
Haizhou Li
SSL
26
8
0
30 Oct 2022
Speaker recognition with two-step multi-modal deep cleansing
Speaker recognition with two-step multi-modal deep cleansing
Ruijie Tao
Kong Aik Lee
Zhan Shi
Haizhou Li
NoLa
47
13
0
28 Oct 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
128
349
0
21 May 2022
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
190
198
0
08 Jan 2021
VoxCeleb2: Deep Speaker Recognition
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
224
2,234
0
14 Jun 2018
1