ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.11284
  4. Cited By
Unsupervised Speech Decomposition via Triple Information Bottleneck

Unsupervised Speech Decomposition via Triple Information Bottleneck

23 April 2020
Kaizhi Qian
Yang Zhang
Shiyu Chang
David D. Cox
M. Hasegawa-Johnson
ArXivPDFHTML

Papers citing "Unsupervised Speech Decomposition via Triple Information Bottleneck"

50 / 50 papers shown
Title
ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
Xinfa Zhu
Lei He
Yujia Xiao
Xi Wang
Xu Tan
Sheng Zhao
Lei Xie
DiffM
40
0
0
08 Jan 2025
Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning
Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning
Runwu Shi
Katsutoshi Itoyama
K. Nakadai
SSL
DRL
46
1
0
31 Dec 2024
Voice Conversion-based Privacy through Adversarial Information Hiding
Voice Conversion-based Privacy through Adversarial Information Hiding
J. Webber
O. Watts
G. Henter
Jennifer Williams
Simon King
45
0
0
23 Sep 2024
Prosody-Driven Privacy-Preserving Dementia Detection
Prosody-Driven Privacy-Preserving Dementia Detection
Dominika Woszczyk
Ranya Aloufi
Soteris Demetriou
39
2
0
03 Jul 2024
Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation
  for Embedding Undetectable Vulnerabilities on Speech Recognition
Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition
Wenhan Yao
Jiangkun Yang
yongqiang He
Jia Liu
Weiping Wen
52
1
0
16 Jun 2024
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with
  Progressive Constraints in a Dual-mode Training Strategy
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy
Linhan Ma
Xinfa Zhu
Yuanjun Lv
Zhichao Wang
Ziqian Wang
Wendi He
Hongbin Zhou
Lei Xie
42
2
0
14 Jun 2024
MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot
  Voice Conversion
MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
Pengcheng Li
Jianzong Wang
Xulong Zhang
Yong Zhang
Jing Xiao
Ning Cheng
DRL
48
1
0
02 May 2024
EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with
  IFUB Estimator and Joint Text-Guided Consistent Learning
EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning
Ziqi Liang
Jianzong Wang
Xulong Zhang
Yong Zhang
Ning Cheng
Jing Xiao
36
1
0
30 Apr 2024
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross
  Attention
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
Junjie Li
Yiwei Guo
Xie Chen
Kai Yu
45
13
0
14 Dec 2023
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust
  Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation
Haram Choi
Sang-Hoon Lee
Seong-Whan Lee
DiffM
34
24
0
08 Nov 2023
VaSAB: The variable size adaptive information bottleneck for
  disentanglement on speech and singing voice
VaSAB: The variable size adaptive information bottleneck for disentanglement on speech and singing voice
F. Bous
Axel Roebel
18
0
0
05 Oct 2023
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
Ruiqi Li
Rongjie Huang
Lichao Zhang
Jinglin Liu
Zhou Zhao
33
4
0
08 May 2023
Label Information Bottleneck for Label Enhancement
Label Information Bottleneck for Label Enhancement
Qinghai Zheng
Jihua Zhu
Haoyu Tang
31
6
0
13 Mar 2023
Speaking Style Conversion in the Waveform Domain Using Discrete
  Self-Supervised Units
Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units
Gallil Maimon
Yossi Adi
34
13
0
19 Dec 2022
VarietySound: Timbre-Controllable Video to Sound Generation via
  Unsupervised Information Disentanglement
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement
Chenye Cui
Yi Ren
Jinglin Liu
Rongjie Huang
Zhou Zhao
VGen
38
14
0
19 Nov 2022
A unified one-shot prosody and speaker conversion system with
  self-supervised discrete speech units
A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units
Li-Wei Chen
Shinji Watanabe
Alexander I. Rudnicky
30
6
0
12 Nov 2022
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for
  Noise-robust Expressive TTS
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Dongchao Yang
Songxiang Liu
Jianwei Yu
Helin Wang
Chao Weng
Yuexian Zou
DiffM
VLM
43
18
0
04 Nov 2022
MetaSpeech: Speech Effects Switch Along with Environment for Metaverse
MetaSpeech: Speech Effects Switch Along with Environment for Metaverse
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
24
1
0
25 Oct 2022
DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion
DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion
Chihiro Watanabe
Hirokazu Kameoka
DRL
37
0
0
20 Oct 2022
ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on
  Pitch and Speed
ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed
Mei-Shuo Chen
Z. Duan
27
10
0
23 Sep 2022
Non-Parallel Voice Conversion for ASR Augmentation
Non-Parallel Voice Conversion for ASR Augmentation
Gary Wang
Andrew Rosenberg
Bhuvana Ramabhadran
Fadi Biadsy
Yinghui Huang
Jesse Emond
P. M. Mengibar
26
2
0
15 Sep 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for
  Speech Synthesis based on Disentanglement between Prosody and Timbre
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre
Guangyan Zhang
Ying Qin
Wentao Zhang
Jialun Wu
Mei Li
Yu Gai
Feijun Jiang
Tan Lee
50
26
0
29 Jun 2022
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain
  Text-to-Speech
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech
Rongjie Huang
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
OODD
VLM
117
34
0
15 May 2022
ContentVec: An Improved Self-Supervised Speech Representation by
  Disentangling Speakers
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers
Kaizhi Qian
Yang Zhang
Heting Gao
Junrui Ni
Cheng-I Jeff Lai
David D. Cox
M. Hasegawa-Johnson
Shiyu Chang
DRL
30
110
0
20 Apr 2022
Disentangled Latent Speech Representation for Automatic Pathological
  Intelligibility Assessment
Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment
Tobias Weise
P. Klumpp
Kubilay Can Demir
Andreas Maier
E. Noeth
B.J. Heismann
Maria Schuster
S. Yang
11
3
0
08 Apr 2022
Enhanced exemplar autoencoder with cycle consistency loss in any-to-one
  voice conversion
Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion
Weida Liang
Lantian Li
Wenqiang Du
Dong Wang
56
0
0
08 Apr 2022
Disentangleing Content and Fine-grained Prosody Information via Hybrid
  ASR Bottleneck Features for Voice Conversion
Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion
Xintao Zhao
Feng Liu
Changhe Song
Zhiyong Wu
Shiyin Kang
Deyi Tuo
Helen Meng
26
21
0
24 Mar 2022
Text-free non-parallel many-to-many voice conversion using normalising
  flows
Text-free non-parallel many-to-many voice conversion using normalising flows
Thomas Merritt
Abdelhamid Ezzerg
Piotr Bilinski
Magdalena Proszewska
Kamil Pokora
Roberto Barra-Chicote
Daniel Korzekwa
36
14
0
15 Mar 2022
CGIBNet: Bandwidth-constrained Communication with Graph Information
  Bottleneck in Multi-Agent Reinforcement Learning
CGIBNet: Bandwidth-constrained Communication with Graph Information Bottleneck in Multi-Agent Reinforcement Learning
Qi Tian
Kun Kuang
Baoxiang Wang
Furui Liu
Fei Wu
26
0
0
20 Dec 2021
Improving Subgraph Recognition with Variational Graph Information
  Bottleneck
Improving Subgraph Recognition with Variational Graph Information Bottleneck
Junchi Yu
Jie Cao
Ran He
22
53
0
18 Dec 2021
How Speech is Recognized to Be Emotional - A Study Based on Information
  Decomposition
How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition
Haoran Sun
Lantian Li
T. Zheng
Dong Wang
CVBM
19
0
0
24 Nov 2021
Zero-shot Singing Technique Conversion
Zero-shot Singing Technique Conversion
Brendan O'Connor
S. Dixon
Georgy Fazekas
35
5
0
16 Nov 2021
Textless Speech Emotion Conversion using Discrete and Decomposed
  Representations
Textless Speech Emotion Conversion using Discrete and Decomposed Representations
Felix Kreuk
Adam Polyak
Jade Copet
Eugene Kharitonov
Tu Nguyen
M. Rivière
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
Yossi Adi
25
29
0
14 Nov 2021
Zero-shot Voice Conversion via Self-supervised Prosody Representation
  Learning
Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning
Shijun Wang
Dimche Kostadinov
Damian Borth
29
11
0
27 Oct 2021
Disentanglement of Emotional Style and Speaker Identity for Expressive
  Voice Conversion
Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion
Zongyang Du
Berrak Sisman
Kun Zhou
Haizhou Li
18
24
0
20 Oct 2021
Towards Universal Neural Vocoding with a Multi-band Excited WaveNet
Towards Universal Neural Vocoding with a Multi-band Excited WaveNet
Axel Roebel
F. Bous
29
2
0
07 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative
  Sequence Models
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models
Jen-Hao Rick Chang
A. Shrivastava
H. Koppula
Xiaoshuai Zhang
Oncel Tuzel
DiffM
51
16
0
06 Oct 2021
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation
Yuanxun Lu
Jinxiang Chai
Xun Cao
29
82
0
22 Sep 2021
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the
  Real World
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World
Emily Wenger
Max Bronckers
Christian Cianfarani
Jenna Cryan
Angela Sha
Haitao Zheng
Ben Y. Zhao
AAML
40
39
0
20 Sep 2021
Complementing Handcrafted Features with Raw Waveform Using a
  Light-weight Auxiliary Model
Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model
Zhongwei Teng
Quchen Fu
Jules White
Maria E. Powell
Douglas C. Schmidt
28
5
0
06 Sep 2021
Learning De-identified Representations of Prosody from Raw Audio
Learning De-identified Representations of Prosody from Raw Audio
J. Weston
R. Lenain
U. Meepegama
E. Fristed
SSL
29
15
0
17 Jul 2021
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
18
352
0
29 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style
  Control
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control
M. Kang
Sungjae Kim
Injung Kim
26
3
0
21 Jun 2021
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised
  Speech Representation Disentanglement for One-shot Voice Conversion
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion
Disong Wang
Liqun Deng
Y. Yeung
Xiao Chen
Xunying Liu
Helen Meng
DRL
22
136
0
18 Jun 2021
Review of end-to-end speech synthesis technology based on deep learning
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
26
24
0
20 Apr 2021
Semi-supervised Learning for Singing Synthesis Timbre
Semi-supervised Learning for Singing Synthesis Timbre
J. Bonada
Merlijn Blaauw
27
4
0
05 Nov 2020
AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and
  Adaptive Instance Normalization
AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization
Yen-Hao Chen
Da-Yi Wu
Tsung-Han Wu
Hung-yi Lee
34
107
0
31 Oct 2020
Graph Information Bottleneck for Subgraph Recognition
Graph Information Bottleneck for Subgraph Recognition
Junchi Yu
Tingyang Xu
Yu Rong
Yatao Bian
Junzhou Huang
Ran He
30
153
0
12 Oct 2020
Contrastive Predictive Coding Supported Factorized Variational
  Autoencoder for Unsupervised Learning of Disentangled Speech Representations
Contrastive Predictive Coding Supported Factorized Variational Autoencoder for Unsupervised Learning of Disentangled Speech Representations
Janek Ebbers
Michael Kuhlmann
Tobias Cord-Landwehr
Reinhold Haeb-Umbach
DRL
CoGe
SSL
31
4
0
26 May 2020
Unconditional Audio Generation with Generative Adversarial Networks and
  Cycle Regularization
Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization
Jen-Yu Liu
Yu-Hua Chen
Yin-Cheng Yeh
Yi-Hsuan Yang
GAN
32
35
0
18 May 2020
1