Unsupervised speech representation learning using WaveNet autoencoders

25 January 2019

Papers citing "Unsupervised speech representation learning using WaveNet autoencoders"

50 / 81 papers shown

Title
Language translation, and change of accent for speech-to-speech task using diffusion model Abhishek Mishra Ritesh Sur Chowdhury Vartul Bahuguna Isha Pandey Ganesh Ramakrishnan DiffM 49 0 0 04 May 2025
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning Alexander H. Liu Heng-Jui Chang Michael Auli Wei-Ning Hsu James R. Glass 29 25 0 17 May 2023
Efficient Domain Adaptation for Speech Foundation Models Bo Li DongSeon Hwang Zhouyuan Huo Junwen Bai Guru Prakash ... K. Sim Yu Zhang Wei Han Trevor Strohman F. Beaufays AI4CE 51 23 0 03 Feb 2023
Generative Models for Improved Naturalness, Intelligibility, and Voicing of Whispered Speech Dominik Wagner Sebastian P. Bayerl H. A. C. Maruri Tobias Bocklet 24 7 0 04 Dec 2022
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing J. Webber Cassia Valentini-Botinhao Evelyn Williams G. Henter Simon King 16 9 0 13 Nov 2022
Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge Ewan Dunbar Nicolas Hamilakis Emmanuel Dupoux SSL 34 30 0 27 Oct 2022
PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting Thomas Lucas Fabien Baradel Philippe Weinzaepfel Grégory Rogez 22 70 0 19 Oct 2022
CLUTR: Curriculum Learning via Unsupervised Task Representation Learning Abdus Salam Azad Izzeddin Gur Jasper Emhoff Nathaniel Alexis Aleksandra Faust Pieter Abbeel Ion Stoica SSL 34 12 0 19 Oct 2022
Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs Ðorðe Miladinovic Kumar Shridhar Kushal Kumar Jain Max B. Paulus J. M. Buhmann Mrinmaya Sachan Carl Allen DRL 30 5 0 26 Sep 2022
Are disentangled representations all you need to build speaker anonymization systems? Pierre Champion D. Jouvet Anthony Larcher 37 20 0 22 Aug 2022
Learning Phone Recognition from Unpaired Audio and Phone Sequences Based on Generative Adversarial Network Da-Rong Liu Po-Chun Hsu Yi-Chen Chen Sung-Feng Huang Shun-Po Chuang Da-Yi Wu Hung-yi Lee GAN 31 7 0 29 Jul 2022
M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation Jinming Zhao Haomiao Yang Ehsan Shareghi Gholamreza Haffari 58 19 0 03 Jul 2022
Speaker Identification using Speech Recognition Syeda Rabia Arshad Syed Mujtaba Haider Abdul Basit Mughal 28 1 0 29 May 2022
Self-Supervised Speech Representation Learning: A Review Abdel-rahman Mohamed Hung-yi Lee Lasse Borgholt Jakob Drachmann Havtorn Joakim Edin ... Shang-Wen Li Karen Livescu Lars Maaløe Tara N. Sainath Shinji Watanabe SSL AI4TS 137 354 0 21 May 2022
Autoregressive Co-Training for Learning Discrete Speech Representations Sung-Lin Yeh Hao Tang SSL 27 6 0 29 Mar 2022
Improve few-shot voice cloning using multi-modal learning Haitong Zhang Yue Lin 21 8 0 18 Mar 2022
Practical cognitive speech compression Reza Lotfidereshgi P. Gournay 35 2 0 08 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning Lasse Borgholt Jakob Drachmann Havtorn Joakim Edin Lars Maaløe Christian Igel BDL AI4TS SSL 19 11 0 01 Mar 2022
Human-Centered Concept Explanations for Neural Networks Chih-Kuan Yeh Been Kim Pradeep Ravikumar FAtt 44 26 0 25 Feb 2022
Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph Dacheng Yin Xuanchi Ren Chong Luo Yuwang Wang Zhiwei Xiong Wenjun Zeng 58 13 0 24 Feb 2022
Word Segmentation on Discovered Phone Units with Dynamic Programming and Self-Supervised Scoring Herman Kamper 34 25 0 24 Feb 2022
Addressing Data Scarcity in Multimodal User State Recognition by Combining Semi-Supervised and Supervised Learning Hendric Voss H. Wersing S. Kopp 21 3 0 08 Feb 2022
Supervised and Self-supervised Pretraining Based COVID-19 Detection Using Acoustic Breathing/Cough/Speech Signals Xing-Yu Chen Qiu-shi Zhu Jie Zhang Lirong Dai 29 14 0 22 Jan 2022
Unsupervised Multimodal Word Discovery based on Double Articulation Analysis with Co-occurrence cues Akira Taniguchi Hiroaki Murakami Ryo Ozaki T. Taniguchi 23 2 0 18 Jan 2022
IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion Wendong Gan Bolong Wen Yin Yan Haitao Chen Zhichao Wang Hongqiang Du Lei Xie Kaixuan Guo Hai Li 20 14 0 02 Jan 2022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing Sanyuan Chen Chengyi Wang Zhengyang Chen Yu-Huan Wu Shujie Liu ... Yao Qian Jian Wu Micheal Zeng Xiangzhan Yu Furu Wei SSL 138 1,721 0 26 Oct 2021
Cognitive Coding of Speech Reza Lotfidereshgi P. Gournay 38 5 0 08 Oct 2021
Interpreting intermediate convolutional layers in unsupervised acoustic word classification Gašper Beguš Alan Zhou FAtt SSL 38 5 0 05 Oct 2021
Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding Saurabhchand Bhati Jesús Villalba Piotr Żelasko Laureano Moro-Velazquez Najim Dehak SSL 58 22 0 05 Oct 2021
Comparison of Self-Supervised Speech Pre-Training Methods on Flemish Dutch Jakob Poncelet Hugo Van hamme SSL 28 1 0 29 Sep 2021
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition Yu Zhang Daniel S. Park Wei Han James Qin Anmol Gulati ... Zhifeng Chen Quoc V. Le Chung-Cheng Chiu Ruoming Pang Yonghui Wu SSL 34 175 0 27 Sep 2021
Noisy-to-Noisy Voice Conversion Framework with Denoising Model Chao Xie Yi-Chiao Wu Patrick Lumban Tobing Wen-Chin Huang Tomoki Toda 26 7 0 22 Sep 2021
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation Yuanxun Lu Jinxiang Chai Xun Cao 29 82 0 22 Sep 2021
A Conditional Generative Matching Model for Multi-lingual Reply Suggestion Budhaditya Deb Guoqing Zheng Milad Shokouhi Ahmed Hassan Awadallah 36 1 0 15 Sep 2021
Speech Representations and Phoneme Classification for Preserving the Endangered Language of Ladin Zane Durante Leena Mathur Eric Ye Sichong Zhao Tejas Ramdas Khalil Iskarous 29 0 0 27 Aug 2021
Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing Benjamin van Niekerk Leanne Nortje Matthew Baas Herman Kamper SSL 38 31 0 02 Aug 2021
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion Disong Wang Liqun Deng Y. Yeung Xiao Chen Xunying Liu Helen Meng DRL 22 136 0 18 Jun 2021
WaveNet-Based Deep Neural Networks for the Characterization of Anomalous Diffusion (WADNet) Dezhong Li Qiujin Yao Zihan Huang DiffM 14 19 0 14 Jun 2021
NWT: Towards natural audio-to-video generation with representation learning Rayhane Mama Marc S. Tyndel Hashiam Kadhim Cole Clifford Ragavan Thurairatnam VGen 34 12 0 08 Jun 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 26 24 0 20 Apr 2021
Wav2vec-C: A Self-supervised Model for Speech Representation Learning Samik Sadhu Di He Che-Wei Huang Sri Harish Reddy Mallidi Minhua Wu Ariya Rastrow A. Stolcke J. Droppo Roland Maas SSL 20 48 0 09 Mar 2021
Enhancing into the codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders Jonah Casebeer Vinjai Vale Umut Isik J. Valin Ritwik Giri A. Krishnaswamy 54 18 0 12 Feb 2021
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data Chengyi Wang Yu-Huan Wu Yao Qian K. Kumatani Shujie Liu Furu Wei Michael Zeng Xuedong Huang OT SSL 38 112 0 19 Jan 2021
Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages Cheng Yi Jianzhong Wang Ning Cheng Shiyu Zhou Bo Xu SSL VLM 34 80 0 22 Dec 2020
Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks Herman Kamper Benjamin van Niekerk SSL MQ 23 35 0 14 Dec 2020
A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings Puyuan Peng Herman Kamper Karen Livescu DRL SSL 14 14 0 03 Dec 2020
Towards Semi-Supervised Semantics Understanding from Speech Cheng-I Jeff Lai Jin Cao S. Bodapati Shang-Wen Li SSL 22 7 0 11 Nov 2020
Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies Alexander H. Liu Yu-An Chung James R. Glass SSL 27 87 0 01 Nov 2020
HarperValleyBank: A Domain-Specific Spoken Dialog Corpus Mike Wu J. Nafziger A. Scodary Andrew L. Maas 31 17 0 26 Oct 2020
Unsupervised Learning of Disentangled Speech Content and Style Representation Andros Tjandra Ruoming Pang Yu Zhang Shigeki Karita BDL DRL 25 15 0 24 Oct 2020