Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.17655
Cited By
Audio-to-Audio Emotion Conversion With Pitch And Duration Style Transfer
23 May 2025
Soumya Dutta
Avni Jain
Sriram Ganapathy
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Audio-to-Audio Emotion Conversion With Pitch And Duration Style Transfer"
40 / 40 papers shown
Title
Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement
Xueyao Zhang
Xiaohui Zhang
Kainan Peng
Zhenyu Tang
Vimal Manohar
...
Yansen Wang
Julian Chan
Yuan Huang
Zhizheng Wu
Mingbo Ma
DiffM
129
6
0
11 Feb 2025
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
Soumya Dutta
Sriram Ganapathy
43
2
0
09 Jan 2024
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Ziyang Ma
Zhisheng Zheng
Jiaxin Ye
Jinchao Li
Zhifu Gao
Shiliang Zhang
Xie Chen
MDE
SLR
SSL
45
103
0
23 Dec 2023
Seamless: Multilingual Expressive and Streaming Speech Translation
Seamless Communication
Loïc Barrault
Yu-An Chung
Mariano Coria Meglioli
David Dale
...
Paden Tomasello
Changhan Wang
Jeff Wang
Skyler Wang
Mary Williamson
39
152
0
08 Dec 2023
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Weidong Chen
Xiaofen Xing
Peihao Chen
Xiangmin Xu
VLM
51
38
0
20 Jul 2023
HCAM -- Hierarchical Cross Attention Model for Multi-modal Emotion Recognition
Soumya Dutta
Sriram Ganapathy
62
17
0
14 Apr 2023
Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing
Nirmesh J. Shah
M. Singh
Naoya Takahashi
N. Onoe
54
15
0
21 Feb 2023
Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units
Gallil Maimon
Yossi Adi
56
14
0
19 Dec 2022
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
113
3,515
0
06 Dec 2022
Speech Synthesis with Mixed Emotions
Kun Zhou
Berrak Sisman
R. Rana
B.W.Schuller
Haizhou Li
42
47
0
11 Aug 2022
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
Sang-gil Lee
Ming-Yu Liu
Boris Ginsburg
Bryan Catanzaro
Sung-Hoon Yoon
54
248
0
09 Jun 2022
Emotion Intensity and its Control for Emotional Voice Conversion
Kun Zhou
Berrak Sisman
R. Rana
Björn W. Schuller
Haizhou Li
104
56
0
10 Jan 2022
Textless Speech-to-Speech Translation on Real Data
Ann Lee
Hongyu Gong
Paul-Ambroise Duquenne
Holger Schwenk
Peng-Jen Chen
...
Sravya Popuri
Yossi Adi
J. Pino
Jiatao Gu
Wei-Ning Hsu
43
147
0
15 Dec 2021
Textless Speech Emotion Conversion using Discrete and Decomposed Representations
Felix Kreuk
Adam Polyak
Jade Copet
Eugene Kharitonov
Tu Nguyen
M. Rivière
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
Yossi Adi
50
32
0
14 Nov 2021
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
Benjamin van Niekerk
M. Carbonneau
Julian Zaïdi
Matthew Baas
Hugo Seuté
Herman Kamper
DRL
60
115
0
03 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
178
1,794
0
26 Oct 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Linfu Xie
39
46
0
14 Sep 2021
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
Yinghao Aaron Li
A. Zare
N. Mesgarani
64
100
0
21 Jul 2021
Direct speech-to-speech translation with discrete units
Ann Lee
Peng-Jen Chen
Changhan Wang
Jiatao Gu
Sravya Popuri
...
Yossi Adi
Qing He
Yun Tang
J. Pino
Wei-Ning Hsu
53
185
0
12 Jul 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
129
2,879
0
14 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
94
170
0
06 Jun 2021
Emotional Voice Conversion: Theory, Databases and ESD
Kun Zhou
Berrak Sisman
Rui Liu
Haizhou Li
66
172
0
31 May 2021
SUPERB: Speech processing Universal PERformance Benchmark
Shu-Wen Yang
Po-Han Chi
Yung-Sung Chuang
Cheng-I Jeff Lai
Kushal Lakhotia
...
Shuyan Dong
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
SSL
80
910
0
03 May 2021
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
Adam Polyak
Yossi Adi
Jade Copet
Eugene Kharitonov
Kushal Lakhotia
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
62
315
0
01 Apr 2021
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Keon Lee
Kyumin Park
Daeyoung Kim
35
31
0
17 Mar 2021
VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech
Kun Zhou
Berrak Sisman
Haizhou Li
DRL
70
41
0
03 Nov 2020
Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset
Kun Zhou
Berrak Sisman
Rui Liu
Haizhou Li
46
190
0
28 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
121
1,918
0
12 Oct 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis
Zhifeng Kong
Ming-Yu Liu
Jiaji Huang
Kexin Zhao
Bryan Catanzaro
DiffM
BDL
82
1,429
0
21 Sep 2020
Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network and an Adversarial Pair Discriminator
Ravi Shankar
Jacob Sager
A. Venkataraman
GAN
55
20
0
25 Jul 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
183
5,734
0
20 Jun 2020
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
Brecht Desplanques
Jenthe Thienpondt
Kris Demuynck
61
1,323
0
14 May 2020
Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion
Kun Zhou
Berrak Sisman
Mingyang Zhang
Haizhou Li
49
55
0
13 May 2020
Decision-Making with Auto-Encoding Variational Bayes
Romain Lopez
Pierre Boyeau
Nir Yosef
Michael I. Jordan
Jeffrey Regier
BDL
226
10,591
0
17 Feb 2020
x-vectors meet emotions: A study on dependencies between emotion and speaker recognition
R. Pappagari
Tianzi Wang
Jesus Villalba
Nanxin Chen
Najim Dehak
54
108
0
12 Feb 2020
Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data
Kun Zhou
Berrak Sisman
Haizhou Li
54
69
0
01 Feb 2020
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Y. Xiao
Fei Ren
Ye Jia
Rif A. Saurous
57
822
0
23 Mar 2018
StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
Yunjey Choi
Min-Je Choi
M. Kim
Jung-Woo Ha
Sunghun Kim
Jaegul Choo
GAN
101
3,547
0
24 Nov 2017
WaveNet: A Generative Model for Raw Audio
Aaron van den Oord
Sander Dieleman
Heiga Zen
Karen Simonyan
Oriol Vinyals
Alex Graves
Nal Kalchbrenner
A. Senior
Koray Kavukcuoglu
DiffM
311
7,361
0
12 Sep 2016
Domain-Adversarial Training of Neural Networks
Yaroslav Ganin
E. Ustinova
Hana Ajakan
Pascal Germain
Hugo Larochelle
François Laviolette
M. Marchand
Victor Lempitsky
GAN
OOD
347
9,418
0
28 May 2015
1