ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.09409
  4. Cited By
Vector-quantized neural networks for acoustic unit discovery in the
  ZeroSpeech 2020 challenge

Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge

19 May 2020
Benjamin van Niekerk
Leanne Nortje
Herman Kamper
ArXivPDFHTML

Papers citing "Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge"

50 / 71 papers shown
Title
Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Na Li
Chuke Wang
Yu Gu
Zhifeng Li
59
0
0
11 Apr 2025
Textless NLP -- Zero Resource Challenge with Low Resource Compute
Textless NLP -- Zero Resource Challenge with Low Resource Compute
Krithiga Ramadass
Abrit Pal Singh
Srihari J
Sheetal Kalyani
VLM
31
0
0
24 Sep 2024
Discrete Unit based Masking for Improving Disentanglement in Voice
  Conversion
Discrete Unit based Masking for Improving Disentanglement in Voice Conversion
Philip H. Lee
Ismail Rasim Ulgen
Berrak Sisman
35
0
0
17 Sep 2024
Improved Visually Prompted Keyword Localisation in Real Low-Resource
  Settings
Improved Visually Prompted Keyword Localisation in Real Low-Resource Settings
Leanne Nortje
Dan Oneaţă
Herman Kamper
VLM
43
0
0
09 Sep 2024
Visually Grounded Speech Models have a Mutual Exclusivity Bias
Visually Grounded Speech Models have a Mutual Exclusivity Bias
Leanne Nortje
Dan Oneaţă
Yevgen Matusevych
Herman Kamper
SSL
47
0
0
20 Mar 2024
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
Cheol Jun Cho
Abdelrahman Mohamed
Shang-Wen Li
Alan W. Black
Gopala K. Anumanchipalli
39
8
0
16 Oct 2023
Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice
  Alignment
Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment
Zheng-Yan Sheng
Yang Ai
Yan-Nian Chen
Zhenhua Ling
CVBM
19
4
0
18 Sep 2023
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
Robin San Roman
Yossi Adi
Antoine Deleforge
Romain Serizel
Gabriel Synnaeve
Alexandre Défossez
DiffM
27
21
0
02 Aug 2023
Representation Learning With Hidden Unit Clustering For Low Resource
  Speech Applications
Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications
Varun Krishna
T. Sai
Sriram Ganapathy
SSL
32
2
0
14 Jul 2023
Rhythm Modeling for Voice Conversion
Rhythm Modeling for Voice Conversion
Benjamin van Niekerk
M. Carbonneau
Herman Kamper
40
5
0
12 Jul 2023
Visually grounded few-shot word learning in low-resource settings
Visually grounded few-shot word learning in low-resource settings
Leanne Nortje
Dan Oneaţă
Herman Kamper
VLM
17
4
0
20 Jun 2023
Privacy in Speech Technology
Privacy in Speech Technology
Tomas Bäckström
29
4
0
09 May 2023
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from
  Style-Based TTS Models
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Yinghao Aaron Li
Cong Han
N. Mesgarani
24
18
0
29 Dec 2022
Learning Dependencies of Discrete Speech Representations with Neural
  Hidden Markov Models
Learning Dependencies of Discrete Speech Representations with Neural Hidden Markov Models
Sung-Lin Yeh
Hao Tang
SSL
BDL
35
1
0
29 Oct 2022
Self-supervised language learning from raw audio: Lessons from the Zero
  Resource Speech Challenge
Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge
Ewan Dunbar
Nicolas Hamilakis
Emmanuel Dupoux
SSL
34
30
0
27 Oct 2022
Bootstrapping meaning through listening: Unsupervised learning of spoken
  sentence embeddings
Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddings
Jian Zhu
Zuoyu Tian
Yadong Liu
Cong Zhang
Chia-wen Lo
SSL
32
2
0
23 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of
  Self-Supervised Speech Representation Learning
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELM
SSL
36
33
0
16 Oct 2022
Towards visually prompted keyword localisation for zero-resource spoken
  languages
Towards visually prompted keyword localisation for zero-resource spoken languages
Leanne Nortje
Herman Kamper
29
6
0
12 Oct 2022
Non-Parallel Voice Conversion for ASR Augmentation
Non-Parallel Voice Conversion for ASR Augmentation
Gary Wang
Andrew Rosenberg
Bhuvana Ramabhadran
Fadi Biadsy
Yinghui Huang
Jesse Emond
P. M. Mengibar
26
2
0
15 Sep 2022
An Evaluation of Three-Stage Voice Conversion Framework for Noisy and
  Reverberant Conditions
An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions
Yeonjong Choi
Chao Xie
T. Toda
DiffM
33
2
0
30 Jun 2022
A Temporal Extension of Latent Dirichlet Allocation for Unsupervised
  Acoustic Unit Discovery
A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit Discovery
W. V. D. Merwe
Herman Kamper
J. D. Preez
22
2
0
23 Jun 2022
Self-supervised speech unit discovery from articulatory and acoustic
  features using VQ-VAE
Self-supervised speech unit discovery from articulatory and acoustic features using VQ-VAE
Marc-Antoine Georges
J. Schwartz
Thomas Hueber
SSL
14
5
0
17 Jun 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
137
352
0
21 May 2022
End-to-End Zero-Shot Voice Conversion with Location-Variable
  Convolutions
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions
Wonjune Kang
M. Hasegawa-Johnson
D. Roy
32
8
0
19 May 2022
SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed
  Stochastic Quantization
SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization
Yuhta Takida
Takashi Shibuya
Wei-Hsiang Liao
Chieh-Hsin Lai
Junki Ohmura
Toshimitsu Uesaka
Naoki Murata
Shusuke Takahashi
Toshiyuki Kumakura
Yuki Mitsufuji
BDL
23
61
0
16 May 2022
Autoregressive Co-Training for Learning Discrete Speech Representations
Autoregressive Co-Training for Learning Discrete Speech Representations
Sung-Lin Yeh
Hao Tang
SSL
27
6
0
29 Mar 2022
Modeling speech recognition and synthesis simultaneously: Encoding and
  decoding lexical and sublexical semantic information into speech with no
  direct access to speech data
Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data
Gašper Beguš
Alan Zhou
SSL
27
5
0
22 Mar 2022
Modelling word learning and recognition using visually grounded speech
Modelling word learning and recognition using visually grounded speech
Danny Merkx
Sebastiaan Scholten
S. Frank
M. Ernestus
O. Scharenborg
SSL
37
0
0
14 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDL
AI4TS
SSL
19
11
0
01 Mar 2022
Word Segmentation on Discovered Phone Units with Dynamic Programming and
  Self-Supervised Scoring
Word Segmentation on Discovered Phone Units with Dynamic Programming and Self-Supervised Scoring
Herman Kamper
34
25
0
24 Feb 2022
AVQVC: One-shot Voice Conversion by Vector Quantization with applying
  contrastive learning
AVQVC: One-shot Voice Conversion by Vector Quantization with applying contrastive learning
Huaizhen Tang
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
12
54
0
21 Feb 2022
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge
  transfer from voice conversion
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion
Disong Wang
Shan Yang
Dan Su
Xunying Liu
Dong Yu
Helen Meng
15
11
0
18 Feb 2022
Robust Vector Quantized-Variational Autoencoder
Chieh-Hsin Lai
Dongmian Zou
Gilad Lerman
DRL
32
5
0
04 Feb 2022
Unsupervised Multimodal Word Discovery based on Double Articulation
  Analysis with Co-occurrence cues
Unsupervised Multimodal Word Discovery based on Double Articulation Analysis with Co-occurrence cues
Akira Taniguchi
Hiroaki Murakami
Ryo Ozaki
T. Taniguchi
23
2
0
18 Jan 2022
Non-Intrusive Binaural Speech Intelligibility Prediction from Discrete
  Latent Representations
Non-Intrusive Binaural Speech Intelligibility Prediction from Discrete Latent Representations
Alex F. McKinney
Benjamin Cauchi
20
3
0
24 Nov 2021
Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion
Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion
Chao Xie
Yi-Chiao Wu
Patrick Lumban Tobing
Wen-Chin Huang
T. Toda
21
9
0
13 Nov 2021
A Comparison of Discrete and Soft Speech Units for Improved Voice
  Conversion
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
Benjamin van Niekerk
M. Carbonneau
Julian Zaïdi
Matthew Baas
Hugo Seuté
Herman Kamper
DRL
27
111
0
03 Nov 2021
Zero-shot Voice Conversion via Self-supervised Prosody Representation
  Learning
Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning
Shijun Wang
Dimche Kostadinov
Damian Borth
29
11
0
27 Oct 2021
Interpreting intermediate convolutional layers in unsupervised acoustic
  word classification
Interpreting intermediate convolutional layers in unsupervised acoustic word classification
Gašper Beguš
Alan Zhou
FAtt
SSL
33
5
0
05 Oct 2021
Unsupervised Speech Segmentation and Variable Rate Representation
  Learning using Segmental Contrastive Predictive Coding
Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding
Saurabhchand Bhati
Jesús Villalba
Piotr Żelasko
Laureano Moro Velázquez
Najim Dehak
SSL
53
22
0
05 Oct 2021
Noisy-to-Noisy Voice Conversion Framework with Denoising Model
Noisy-to-Noisy Voice Conversion Framework with Denoising Model
Chao Xie
Yi-Chiao Wu
Patrick Lumban Tobing
Wen-Chin Huang
T. Toda
23
7
0
22 Sep 2021
Masked Acoustic Unit for Mispronunciation Detection and Correction
Masked Acoustic Unit for Mispronunciation Detection and Correction
Zhan Zhang
Yuehai Wang
Jianyi Yang
30
3
0
12 Aug 2021
Analyzing Speaker Information in Self-Supervised Models to Improve
  Zero-Resource Speech Processing
Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing
Benjamin van Niekerk
Leanne Nortje
Matthew Baas
Herman Kamper
SSL
33
31
0
02 Aug 2021
Expressive Voice Conversion: A Joint Framework for Speaker Identity and
  Emotional Style Transfer
Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer
Zongyang Du
Berrak Sisman
Kun Zhou
Haizhou Li
32
20
0
08 Jul 2021
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised
  Speech Representation Disentanglement for One-shot Voice Conversion
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion
Disong Wang
Liqun Deng
Y. Yeung
Xiao Chen
Xunying Liu
Helen Meng
DRL
22
136
0
18 Jun 2021
Unsupervised Automatic Speech Recognition: A Review
Unsupervised Automatic Speech Recognition: A Review
Hanan Aldarmaki
Asad Ullah
Nazar Zaki
VLM
SSL
39
57
0
09 Jun 2021
Segmental Contrastive Predictive Coding for Unsupervised Word
  Segmentation
Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation
Saurabhchand Bhati
Jesús Villalba
Piotr Żelasko
Laureano Moro Velázquez
Najim Dehak
SSL
19
37
0
03 Jun 2021
Unsupervised Speech Recognition
Unsupervised Speech Recognition
Alexei Baevski
Wei-Ning Hsu
Alexis Conneau
Michael Auli
SSL
26
270
0
24 May 2021
Discrete representations in neural models of spoken language
Discrete representations in neural models of spoken language
Bertrand Higy
Lieke Gelderloos
A. Alishahi
Grzegorz Chrupała
21
6
0
12 May 2021
VQCPC-GAN: Variable-Length Adversarial Audio Synthesis Using
  Vector-Quantized Contrastive Predictive Coding
VQCPC-GAN: Variable-Length Adversarial Audio Synthesis Using Vector-Quantized Contrastive Predictive Coding
J. Nistal
Cyran Aouameur
Stefan Lattner
G. Richard
19
7
0
04 May 2021
12
Next