Unsupervised Speech Decomposition via Triple Information Bottleneck

23 April 2020

Kaizhi Qian

Papers citing "Unsupervised Speech Decomposition via Triple Information Bottleneck"

50 / 50 papers shown

Title
ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training Xinfa Zhu Lei He Yujia Xiao Xi Wang Xu Tan Sheng Zhao Lei Xie DiffM 40 0 0 08 Jan 2025
Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning Runwu Shi Katsutoshi Itoyama K. Nakadai SSL DRL 46 1 0 31 Dec 2024
Voice Conversion-based Privacy through Adversarial Information Hiding J. Webber O. Watts G. Henter Jennifer Williams Simon King 45 0 0 23 Sep 2024
Prosody-Driven Privacy-Preserving Dementia Detection Dominika Woszczyk Ranya Aloufi Soteris Demetriou 39 2 0 03 Jul 2024
Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition Wenhan Yao Jiangkun Yang yongqiang He Jia Liu Weiping Wen 52 1 0 16 Jun 2024
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy Linhan Ma Xinfa Zhu Yuanjun Lv Zhichao Wang Ziqian Wang Wendi He Hongbin Zhou Lei Xie 42 2 0 14 Jun 2024
MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion Pengcheng Li Jianzong Wang Xulong Zhang Yong Zhang Jing Xiao Ning Cheng DRL 48 1 0 02 May 2024
EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning Ziqi Liang Jianzong Wang Xulong Zhang Yong Zhang Ning Cheng Jing Xiao 36 1 0 30 Apr 2024
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention Junjie Li Yiwei Guo Xie Chen Kai Yu 45 13 0 14 Dec 2023
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation Haram Choi Sang-Hoon Lee Seong-Whan Lee DiffM 34 24 0 08 Nov 2023
VaSAB: The variable size adaptive information bottleneck for disentanglement on speech and singing voice F. Bous Axel Roebel 18 0 0 05 Oct 2023
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment Ruiqi Li Rongjie Huang Lichao Zhang Jinglin Liu Zhou Zhao 33 4 0 08 May 2023
Label Information Bottleneck for Label Enhancement Qinghai Zheng Jihua Zhu Haoyu Tang 31 6 0 13 Mar 2023
Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units Gallil Maimon Yossi Adi 34 13 0 19 Dec 2022
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement Chenye Cui Yi Ren Jinglin Liu Rongjie Huang Zhou Zhao VGen 38 14 0 19 Nov 2022
A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units Li-Wei Chen Shinji Watanabe Alexander I. Rudnicky 30 6 0 12 Nov 2022
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS Dongchao Yang Songxiang Liu Jianwei Yu Helin Wang Chao Weng Yuexian Zou DiffM VLM 43 18 0 04 Nov 2022
MetaSpeech: Speech Effects Switch Along with Environment for Metaverse Xulong Zhang Jianzong Wang Ning Cheng Jing Xiao 24 1 0 25 Oct 2022
DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion Chihiro Watanabe Hirokazu Kameoka DRL 37 0 0 20 Oct 2022
ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed Mei-Shuo Chen Z. Duan 27 10 0 23 Sep 2022
Non-Parallel Voice Conversion for ASR Augmentation Gary Wang Andrew Rosenberg Bhuvana Ramabhadran Fadi Biadsy Yinghui Huang Jesse Emond P. M. Mengibar 26 2 0 15 Sep 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre Guangyan Zhang Ying Qin Wentao Zhang Jialun Wu Mei Li Yu Gai Feijun Jiang Tan Lee 50 26 0 29 Jun 2022
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Rongjie Huang Yi Ren Jinglin Liu Chenye Cui Zhou Zhao OODD VLM 117 34 0 15 May 2022
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers Kaizhi Qian Yang Zhang Heting Gao Junrui Ni Cheng-I Jeff Lai David D. Cox M. Hasegawa-Johnson Shiyu Chang DRL 30 110 0 20 Apr 2022
Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment Tobias Weise P. Klumpp Kubilay Can Demir Andreas Maier E. Noeth B.J. Heismann Maria Schuster S. Yang 11 3 0 08 Apr 2022
Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion Weida Liang Lantian Li Wenqiang Du Dong Wang 56 0 0 08 Apr 2022
Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion Xintao Zhao Feng Liu Changhe Song Zhiyong Wu Shiyin Kang Deyi Tuo Helen Meng 26 21 0 24 Mar 2022
Text-free non-parallel many-to-many voice conversion using normalising flows Thomas Merritt Abdelhamid Ezzerg Piotr Bilinski Magdalena Proszewska Kamil Pokora Roberto Barra-Chicote Daniel Korzekwa 36 14 0 15 Mar 2022
CGIBNet: Bandwidth-constrained Communication with Graph Information Bottleneck in Multi-Agent Reinforcement Learning Qi Tian Kun Kuang Baoxiang Wang Furui Liu Fei Wu 26 0 0 20 Dec 2021
Improving Subgraph Recognition with Variational Graph Information Bottleneck Junchi Yu Jie Cao Ran He 22 53 0 18 Dec 2021
How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition Haoran Sun Lantian Li T. Zheng Dong Wang CVBM 19 0 0 24 Nov 2021
Zero-shot Singing Technique Conversion Brendan O'Connor S. Dixon Georgy Fazekas 35 5 0 16 Nov 2021
Textless Speech Emotion Conversion using Discrete and Decomposed Representations Felix Kreuk Adam Polyak Jade Copet Eugene Kharitonov Tu Nguyen M. Rivière Wei-Ning Hsu Abdel-rahman Mohamed Emmanuel Dupoux Yossi Adi 25 29 0 14 Nov 2021
Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning Shijun Wang Dimche Kostadinov Damian Borth 29 11 0 27 Oct 2021
Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion Zongyang Du Berrak Sisman Kun Zhou Haizhou Li 18 24 0 20 Oct 2021
Towards Universal Neural Vocoding with a Multi-band Excited WaveNet Axel Roebel F. Bous 29 2 0 07 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models Jen-Hao Rick Chang A. Shrivastava H. Koppula Xiaoshuai Zhang Oncel Tuzel DiffM 51 16 0 06 Oct 2021
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation Yuanxun Lu Jinxiang Chai Xun Cao 29 82 0 22 Sep 2021
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World Emily Wenger Max Bronckers Christian Cianfarani Jenna Cryan Angela Sha Haitao Zheng Ben Y. Zhao AAML 40 39 0 20 Sep 2021
Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model Zhongwei Teng Quchen Fu Jules White Maria E. Powell Douglas C. Schmidt 28 5 0 06 Sep 2021
Learning De-identified Representations of Prosody from Raw Audio J. Weston R. Lenain U. Meepegama E. Fristed SSL 29 15 0 17 Jul 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 352 0 29 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control M. Kang Sungjae Kim Injung Kim 26 3 0 21 Jun 2021
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion Disong Wang Liqun Deng Y. Yeung Xiao Chen Xunying Liu Helen Meng DRL 22 136 0 18 Jun 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 26 24 0 20 Apr 2021
Semi-supervised Learning for Singing Synthesis Timbre J. Bonada Merlijn Blaauw 27 4 0 05 Nov 2020
AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization Yen-Hao Chen Da-Yi Wu Tsung-Han Wu Hung-yi Lee 34 107 0 31 Oct 2020
Graph Information Bottleneck for Subgraph Recognition Junchi Yu Tingyang Xu Yu Rong Yatao Bian Junzhou Huang Ran He 30 153 0 12 Oct 2020
Contrastive Predictive Coding Supported Factorized Variational Autoencoder for Unsupervised Learning of Disentangled Speech Representations Janek Ebbers Michael Kuhlmann Tobias Cord-Landwehr Reinhold Haeb-Umbach DRL CoGe SSL 31 4 0 26 May 2020
Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization Jen-Yu Liu Yu-Hua Chen Yin-Cheng Yeh Yi-Hsuan Yang GAN 32 35 0 18 May 2020