ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17655
  4. Cited By
Audio-to-Audio Emotion Conversion With Pitch And Duration Style Transfer

Audio-to-Audio Emotion Conversion With Pitch And Duration Style Transfer

23 May 2025
Soumya Dutta
Avni Jain
Sriram Ganapathy
ArXivPDFHTML

Papers citing "Audio-to-Audio Emotion Conversion With Pitch And Duration Style Transfer"

40 / 40 papers shown
Title
Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement
Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement
Xueyao Zhang
Xiaohui Zhang
Kainan Peng
Zhenyu Tang
Vimal Manohar
...
Yansen Wang
Julian Chan
Yuan Huang
Zhizheng Wu
Mingbo Ma
DiffM
129
6
0
11 Feb 2025
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
Soumya Dutta
Sriram Ganapathy
43
2
0
09 Jan 2024
emotion2vec: Self-Supervised Pre-Training for Speech Emotion
  Representation
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Ziyang Ma
Zhisheng Zheng
Jiaxin Ye
Jinchao Li
Zhifu Gao
Shiliang Zhang
Xie Chen
MDE
SLR
SSL
45
103
0
23 Dec 2023
Seamless: Multilingual Expressive and Streaming Speech Translation
Seamless: Multilingual Expressive and Streaming Speech Translation
Seamless Communication
Loïc Barrault
Yu-An Chung
Mariano Coria Meglioli
David Dale
...
Paden Tomasello
Changhan Wang
Jeff Wang
Skyler Wang
Mary Williamson
39
152
0
08 Dec 2023
Vesper: A Compact and Effective Pretrained Model for Speech Emotion
  Recognition
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Weidong Chen
Xiaofen Xing
Peihao Chen
Xiangmin Xu
VLM
51
38
0
20 Jul 2023
HCAM -- Hierarchical Cross Attention Model for Multi-modal Emotion
  Recognition
HCAM -- Hierarchical Cross Attention Model for Multi-modal Emotion Recognition
Soumya Dutta
Sriram Ganapathy
62
17
0
14 Apr 2023
Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs
  Using Dual Domain Adversarial Network & Virtual Domain Pairing
Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing
Nirmesh J. Shah
M. Singh
Naoya Takahashi
N. Onoe
54
15
0
21 Feb 2023
Speaking Style Conversion in the Waveform Domain Using Discrete
  Self-Supervised Units
Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units
Gallil Maimon
Yossi Adi
56
14
0
19 Dec 2022
Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
113
3,515
0
06 Dec 2022
Speech Synthesis with Mixed Emotions
Speech Synthesis with Mixed Emotions
Kun Zhou
Berrak Sisman
R. Rana
B.W.Schuller
Haizhou Li
42
47
0
11 Aug 2022
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
Sang-gil Lee
Ming-Yu Liu
Boris Ginsburg
Bryan Catanzaro
Sung-Hoon Yoon
54
248
0
09 Jun 2022
Emotion Intensity and its Control for Emotional Voice Conversion
Emotion Intensity and its Control for Emotional Voice Conversion
Kun Zhou
Berrak Sisman
R. Rana
Björn W. Schuller
Haizhou Li
104
56
0
10 Jan 2022
Textless Speech-to-Speech Translation on Real Data
Textless Speech-to-Speech Translation on Real Data
Ann Lee
Hongyu Gong
Paul-Ambroise Duquenne
Holger Schwenk
Peng-Jen Chen
...
Sravya Popuri
Yossi Adi
J. Pino
Jiatao Gu
Wei-Ning Hsu
43
147
0
15 Dec 2021
Textless Speech Emotion Conversion using Discrete and Decomposed
  Representations
Textless Speech Emotion Conversion using Discrete and Decomposed Representations
Felix Kreuk
Adam Polyak
Jade Copet
Eugene Kharitonov
Tu Nguyen
M. Rivière
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
Yossi Adi
50
32
0
14 Nov 2021
A Comparison of Discrete and Soft Speech Units for Improved Voice
  Conversion
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
Benjamin van Niekerk
M. Carbonneau
Julian Zaïdi
Matthew Baas
Hugo Seuté
Herman Kamper
DRL
60
115
0
03 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
178
1,794
0
26 Oct 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech
  synthesis
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Linfu Xie
39
46
0
14 Sep 2021
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for
  Natural-Sounding Voice Conversion
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
Yinghao Aaron Li
A. Zare
N. Mesgarani
64
100
0
21 Jul 2021
Direct speech-to-speech translation with discrete units
Direct speech-to-speech translation with discrete units
Ann Lee
Peng-Jen Chen
Changhan Wang
Jiatao Gu
Sravya Popuri
...
Yossi Adi
Qing He
Yun Tang
J. Pino
Wei-Ning Hsu
53
185
0
12 Jul 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
129
2,879
0
14 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
94
170
0
06 Jun 2021
Emotional Voice Conversion: Theory, Databases and ESD
Emotional Voice Conversion: Theory, Databases and ESD
Kun Zhou
Berrak Sisman
Rui Liu
Haizhou Li
66
172
0
31 May 2021
SUPERB: Speech processing Universal PERformance Benchmark
SUPERB: Speech processing Universal PERformance Benchmark
Shu-Wen Yang
Po-Han Chi
Yung-Sung Chuang
Cheng-I Jeff Lai
Kushal Lakhotia
...
Shuyan Dong
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
SSL
80
910
0
03 May 2021
Speech Resynthesis from Discrete Disentangled Self-Supervised
  Representations
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
Adam Polyak
Yossi Adi
Jade Copet
Eugene Kharitonov
Kushal Lakhotia
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
62
315
0
01 Apr 2021
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech
  Decomposition for Expressive and Controllable Neural Text to Speech
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Keon Lee
Kyumin Park
Daeyoung Kim
35
31
0
17 Mar 2021
VAW-GAN for Disentanglement and Recomposition of Emotional Elements in
  Speech
VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech
Kun Zhou
Berrak Sisman
Haizhou Li
DRL
70
41
0
03 Nov 2020
Seen and Unseen emotional style transfer for voice conversion with a new
  emotional speech dataset
Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset
Kun Zhou
Berrak Sisman
Rui Liu
Haizhou Li
46
190
0
28 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
121
1,918
0
12 Oct 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis
DiffWave: A Versatile Diffusion Model for Audio Synthesis
Zhifeng Kong
Ming-Yu Liu
Jiaji Huang
Kexin Zhao
Bryan Catanzaro
DiffM
BDL
82
1,429
0
21 Sep 2020
Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network
  and an Adversarial Pair Discriminator
Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network and an Adversarial Pair Discriminator
Ravi Shankar
Jacob Sager
A. Venkataraman
GAN
55
20
0
25 Jul 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
  Representations
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
183
5,734
0
20 Jun 2020
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in
  TDNN Based Speaker Verification
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
Brecht Desplanques
Jenthe Thienpondt
Kris Demuynck
61
1,323
0
14 May 2020
Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice
  Conversion
Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion
Kun Zhou
Berrak Sisman
Mingyang Zhang
Haizhou Li
49
55
0
13 May 2020
Decision-Making with Auto-Encoding Variational Bayes
Decision-Making with Auto-Encoding Variational Bayes
Romain Lopez
Pierre Boyeau
Nir Yosef
Michael I. Jordan
Jeffrey Regier
BDL
226
10,591
0
17 Feb 2020
x-vectors meet emotions: A study on dependencies between emotion and
  speaker recognition
x-vectors meet emotions: A study on dependencies between emotion and speaker recognition
R. Pappagari
Tianzi Wang
Jesus Villalba
Nanxin Chen
Najim Dehak
54
108
0
12 Feb 2020
Transforming Spectrum and Prosody for Emotional Voice Conversion with
  Non-Parallel Training Data
Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data
Kun Zhou
Berrak Sisman
Haizhou Li
54
69
0
01 Feb 2020
Style Tokens: Unsupervised Style Modeling, Control and Transfer in
  End-to-End Speech Synthesis
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Y. Xiao
Fei Ren
Ye Jia
Rif A. Saurous
57
822
0
23 Mar 2018
StarGAN: Unified Generative Adversarial Networks for Multi-Domain
  Image-to-Image Translation
StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
Yunjey Choi
Min-Je Choi
M. Kim
Jung-Woo Ha
Sunghun Kim
Jaegul Choo
GAN
101
3,547
0
24 Nov 2017
WaveNet: A Generative Model for Raw Audio
WaveNet: A Generative Model for Raw Audio
Aaron van den Oord
Sander Dieleman
Heiga Zen
Karen Simonyan
Oriol Vinyals
Alex Graves
Nal Kalchbrenner
A. Senior
Koray Kavukcuoglu
DiffM
311
7,361
0
12 Sep 2016
Domain-Adversarial Training of Neural Networks
Domain-Adversarial Training of Neural Networks
Yaroslav Ganin
E. Ustinova
Hana Ajakan
Pascal Germain
Hugo Larochelle
François Laviolette
M. Marchand
Victor Lempitsky
GAN
OOD
347
9,418
0
28 May 2015
1