Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.09224
Cited By
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers
20 April 2022
Kaizhi Qian
Yang Zhang
Heting Gao
Junrui Ni
Cheng-I Jeff Lai
David D. Cox
M. Hasegawa-Johnson
Shiyu Chang
DRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers"
50 / 81 papers shown
Title
SingNet: Towards a Large-Scale, Diverse, and In-the-Wild Singing Voice Dataset
Yicheng Gu
Chaoren Wang
J. Zhang
Xueyao Zhang
Zihao Fang
Haorui He
Zhizheng Wu
24
2
0
14 May 2025
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Dianwen Ng
Kun Zhou
Yi-Wen Chao
Zhiwei Xiong
B. Ma
E. Chng
31
0
0
12 May 2025
kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization
Keren Shao
K. Chen
Matthew Baas
Shlomo Dubnov
20
0
0
08 Apr 2025
AVENet: Disentangling Features by Approximating Average Features for Voice Conversion
Wenyu Wang
Yiquan Zhou
Jihua Zhu
Hongwu Ding
Jiacheng Xu
Shihao Li
DRL
32
0
0
08 Apr 2025
Serenade: A Singing Style Conversion Framework Based On Audio Infilling
Lester Phillip Violeta
Wen-Chin Huang
T. Toda
35
0
0
16 Mar 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
Alexander H. Liu
Sang-gil Lee
Chao-Han Huck Yang
Yuan Gong
Yu-Chun Wang
James Glass
Rafael Valle
Bryan Catanzaro
SSL
52
0
0
02 Mar 2025
Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Jialong Zuo
Shengpeng Ji
Minghui Fang
Ziyue Jiang
Xize Cheng
...
Wenrui Liu
Guangyan Zhang
Zehai Tu
Yiwen Guo
Zhou Zhao
49
0
0
08 Feb 2025
Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model
Joonyong Park
Daisuke Saito
N. Minematsu
67
0
0
04 Dec 2024
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
Yiwei Guo
Zhihan Li
Chenpeng Du
Hankun Wang
Xie Chen
Kai Yu
31
1
0
21 Oct 2024
AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup
Carlos Carvalho
A. Abad
21
0
0
18 Oct 2024
JOOCI: a Framework for Learning Comprehensive Speech Representations
Hemant Yadav
R. Shah
Sunayana Sitaram
23
0
0
14 Oct 2024
Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control
Ryuichi Yamamoto
Yuma Shirahata
Masaya Kawamura
Kentaro Tachibana
DiffM
32
2
0
26 Sep 2024
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion
Giuseppe Ruggiero
Matteo Testa
Jurgen Van de Walle
Luigi Di Caro
21
0
0
25 Sep 2024
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Li-Wei Chen
Takuya Higuchi
He Bai
Ahmed Hussen Abdelaziz
Alexander Rudnicky
Shinji Watanabe
Tatiana Likhomanenko
B. Theobald
Zakaria Aldeneh
49
0
0
16 Sep 2024
Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT
Ryota Komatsu
Takahiro Shinozaki
SSL
36
1
0
16 Sep 2024
Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
Xuanru Zhou
Cheol Jun Cho
Ayati Sharma
Brittany Morin
D. Baquirin
...
Zachary Miller
B. Tee
M. G. Tempini
Jiachen Lian
Gopala Anumanchipalli
34
3
0
15 Sep 2024
Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations
Wangjin Zhou
Fengrun Zhang
Yiming Liu
Wenhao Guan
Yi Zhao
He Qu
22
1
0
12 Sep 2024
Estimating the Completeness of Discrete Speech Units
Sung-Lin Yeh
Hao Tang
28
1
0
09 Sep 2024
LAST: Language Model Aware Speech Tokenization
A. Turetzky
Yossi Adi
29
2
0
05 Sep 2024
Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization
Zexin Cai
Henry Li Xinyuan
Ashi Garg
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Nicholas Andrews
Matthew Wiesner
29
1
0
05 Sep 2024
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
Yiwei Guo
Zhihan Li
Junjie Li
Chenpeng Du
Hankun Wang
Shuai Wang
Xie Chen
Kai Yu
33
0
0
03 Sep 2024
Progressive Residual Extraction based Pre-training for Speech Representation Learning
Tianrui Wang
Jin Li
Ziyang Ma
Rui Cao
Xie Chen
...
Meng Ge
Xiaobao Wang
Yuguang Wang
Jianwu Dang
Nyima Tashi
SSL
43
0
0
31 Aug 2024
User-Driven Voice Generation and Editing through Latent Space Navigation
Yusheng Tian
Junbin Liu
Tan Lee
DiffM
39
2
0
30 Aug 2024
RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
A. R. Bargum
Simon Lajboschitz
Cumhur Erkut
27
1
0
29 Aug 2024
SSDM: Scalable Speech Dysfluency Modeling
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
32
1
0
29 Aug 2024
YOLO-Stutter: End-to-end Region-Wise Speech Dysfluency Detection
Xuanru Zhou
Anshul Kashyap
Steve Li
Ayati Sharma
Brittany Morin
...
Z. Ezzes
Zachary Miller
M. G. Tempini
Jiachen Lian
Gopala Krishna Anumanchipalli
24
6
0
27 Aug 2024
Hear Your Face: Face-based voice conversion with F0 estimation
Jaejun Lee
Yoori Oh
Injune Hwang
Kyogu Lee
CVBM
26
1
0
19 Aug 2024
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
Jiawei Huang
Chen Zhang
Yi Ren
Ziyue Jiang
Zhenhui Ye
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
35
2
0
08 Aug 2024
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks
Nakamasa Inoue
Shinta Otake
Takumi Hirose
Masanari Ohi
Rei Kawakami
34
1
0
28 Jul 2024
SaMoye: Zero-shot Singing Voice Conversion Based on Feature Disentanglement and Synthesis
Zihao Wang
Le Ma
Yan Liu
K. Zhang
DRL
31
0
0
10 Jul 2024
Towards the Next Frontier in Speech Representation Learning Using Disentanglement
Varun Krishna
Sriram Ganapathy
SSL
17
1
0
02 Jul 2024
Self-Supervised Embeddings for Detecting Individual Symptoms of Depression
Sri Harsha Dumpala
Katerina Dikaios
Abraham Nunes
Frank Rudzicz
Rudolf Uher
Sageev Oore
SSL
41
1
0
25 Jun 2024
Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework
Hokuto Munakata
Ryo Terashima
Yusuke Fujita
35
0
0
24 Jun 2024
SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction
Yuxun Tang
Jiatong Shi
Yuning Wu
Qin Jin
29
9
0
16 Jun 2024
LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks
Amit Meghanani
Thomas Hain
41
1
0
13 Jun 2024
SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Bingsong Bai
Fengping Wang
Yingming Gao
Ya Li
46
0
0
09 Jun 2024
CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Yongyi Zang
Jiatong Shi
You Zhang
Ryuichi Yamamoto
Jionghao Han
...
Shengyuan Xu
Wenxiao Zhao
Jing Guo
T. Toda
Zhiyao Duan
26
10
0
04 Jun 2024
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
Yimin Deng
Jianzong Wang
Xulong Zhang
Ning Cheng
Jing Xiao
24
0
0
01 May 2024
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
38
19
0
15 Apr 2024
The VoicePrivacy 2024 Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Pierre Champion
Sarina Meyer
Xin Wang
Emmanuel Vincent
Michele Panariello
Nicholas W. D. Evans
Junichi Yamagishi
Massimiliano Todisco
36
21
0
03 Apr 2024
Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling
Injune Hwang
Kyogu Lee
21
0
0
01 Apr 2024
SCORE: Self-supervised Correspondence Fine-tuning for Improved Content Representations
Amit Meghanani
Thomas Hain
35
3
0
10 Mar 2024
SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion
Liumeng Xue
Chaoren Wang
Mingxuan Wang
Xueyao Zhang
Jun Han
Zhizheng Wu
DiffM
24
5
0
20 Feb 2024
Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
Zakaria Aldeneh
Takuya Higuchi
Jee-weon Jung
Skyler Seto
Tatiana Likhomanenko
Stephen Shum
Ahmed Hussen Abdelaziz
Shinji Watanabe
B. Theobald
SSL
34
2
0
01 Feb 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
Hyoung-Seok Oh
Sang-Hoon Lee
Deok-Hyun Cho
Seong-Whan Lee
39
1
0
16 Jan 2024
CoMoSVC: Consistency Model-based Singing Voice Conversion
Yiwen Lu
Zhen Ye
Wei Xue
Xu Tan
Qi-fei Liu
Yi-Ting Guo
22
11
0
03 Jan 2024
Frame-level emotional state alignment method for speech emotion recognition
Qifei Li
Yingming Gao
Cong Wang
Yayue Deng
Jinlong Xue
Yichen Han
Ya Li
28
2
0
27 Dec 2023
Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction
Zhaoxi Mu
Xinyu Yang
Sining Sun
Qing Yang
SSL
18
8
0
16 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
27
26
0
15 Dec 2023
Low-latency Real-time Voice Conversion on CPU
Konstantine Sadov
Matthew Hutter
Asara Near
VLM
23
1
0
01 Nov 2023
1
2
Next