Title
tPLCnet: Real-time Deep Packet Loss Concealment in the Time Domain Using a Short Temporal Context Nils L. Westhausen B. Meyer 23 7 0 04 Apr 2022
Lip to Speech Synthesis with Visual Context Attentional GAN Minsu Kim Joanna Hong Y. Ro 33 51 0 04 Apr 2022
Into-TTS : Intonation Template Based Prosody Control System Jihwan Lee Joun Yeop Lee Heejin Choi Seongkyu Mun Sangjun Park Jae-Sung Bae Chanwoo Kim 27 4 0 04 Apr 2022
On incorporating social speaker characteristics in synthetic speech S. Rallabandi Sebastian Möller 21 0 0 03 Apr 2022
Residual-guided Personalized Speech Synthesis based on Face Image Jianrong Wang Zixuan Wang Xiaosheng Hu Xuewei Li Qiang Fang Li Liu CVBM 27 16 0 01 Apr 2022
Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis Fan Wang Po-Chun Hsu Da-Rong Liu Hung-yi Lee 18 0 0 01 Apr 2022
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis Karren D. Yang Dejan Marković Steven Krenn Vasu Agrawal Alexander Richard VGen 22 32 0 31 Mar 2022
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech Guangyan Zhang Kaitao Song Xu Tan Daxin Tan Yuzi Yan ... G. Wang Wei Zhou Tao Qin Tan Lee Sheng Zhao SSL 30 21 0 31 Mar 2022
NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism Jingbei Li Yi Meng Zhiyong Wu Helen Meng Qiao Tian Yuping Wang Yuxuan Wang 25 21 0 31 Mar 2022
An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion Zijiang Yang Xin Jing Andreas Triantafyllopoulos Meishu Song Ilhan Aslan Björn W. Schuller 20 14 0 29 Mar 2022
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition Junrui Ni Liming Wang Heting Gao Kaizhi Qian Yang Zhang Shiyu Chang M. Hasegawa-Johnson 25 25 0 29 Mar 2022
DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning Takaaki Saeki Kentaro Tachibana Ryuichi Yamamoto 15 10 0 29 Mar 2022
$Applying Syntax$\unicode{x2013}$Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis$ Applying Syntax $\unicode{x2013}$ Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis Kei Furukawa Takeshi Kishiyama Satoshi Nakamura 11 1 0 29 Mar 2022
vTTS: visual-text to speech Yoshifumi Nakano Takaaki Saeki Shinnosuke Takamichi Katsuhito Sudoh Hiroshi Saruwatari 25 4 0 28 Mar 2022
Attacker Attribution of Audio Deepfakes Nicolas Müller Franziska Dieckmann Jennifer Williams 17 13 0 28 Mar 2022
A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis Rishabh Jain Mariam Yiwere Dan Bigioi Peter Corcoran H. Cucu 27 14 0 22 Mar 2022
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling Bac Nguyen Fabien Cardinaux Stefan Uhlich 16 2 0 21 Mar 2022
WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses Zewang Zhang Yibin Zheng Xinhui Li Li Lu 26 16 0 21 Mar 2022
ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis Jinlong Xue Yayue Deng Yichen Han Ya Li Jianqing Sun Jiaen Liang 12 8 0 20 Mar 2022
Improve few-shot voice cloning using multi-modal learning Haitong Zhang Yue Lin 21 8 0 18 Mar 2022
DialogueNeRF: Towards Realistic Avatar Face-to-Face Conversation Video Generation Yichao Yan Zanwei Zhou Zi Wang Chen-Ning Yang Xiaokang Yang CVBM 21 19 0 15 Mar 2022
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features Florian Lux Ngoc Thang Vu 33 29 0 07 Mar 2022
Variational Auto-Encoder based Mandarin Speech Cloning Qingyu Xing Xiaohan Ma 26 0 0 06 Mar 2022
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS Haohan Guo Hui Lu Xixin Wu Helen Meng 185 7 0 02 Mar 2022
Real time spectrogram inversion on mobile phone Oleg Rybakov Marco Tagliasacchi Yunpeng Li Liyang Jiang Xia Zhang Fadi Biadsy 34 4 0 01 Mar 2022
Revisiting Over-Smoothness in Text to Speech Yi Ren Xu Tan Tao Qin Zhou Zhao Tie-Yan Liu 87 61 0 26 Feb 2022
nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-shot Multi-speaker Text-to-Speech Bo Zhao Xulong Zhang Jianzong Wang Ning Cheng Jing Xiao DiffM 26 22 0 22 Feb 2022
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing Tao Wang Jiangyan Yi Ruibo Fu J. Tao Zhengqi Wen KELM 27 18 0 21 Feb 2022
A Review on Methods and Applications in Multimodal Deep Learning Summaira Jabeen Xi Li Muhammad Shoib Amin Abdul Jabbar VLM HAI 32 88 0 18 Feb 2022
ADD 2022: the First Audio Deep Synthesis Detection Challenge Jiangyan Yi Ruibo Fu J. Tao Shuai Nie Haoxin Ma ... Le Xu Zhengqi Wen Haizhou Li Zheng Lian Bin Liu 25 176 0 17 Feb 2022
ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech Yi Ren Ming Lei Zhiying Huang Shi-Rui Zhang Qian Chen Zhijie Yan Zhou Zhao 40 41 0 16 Feb 2022
Deep Performer: Score-to-Audio Music Performance Synthesis Hao-Wen Dong Cong Zhou Taylor Berg-Kirkpatrick Julian McAuley 27 17 0 12 Feb 2022
Building Synthetic Speaker Profiles in Text-to-Speech Systems Jie Pu Yi Meng Oguz H. Elibol 15 2 0 07 Feb 2022
Tubes Among Us: Analog Attack on Automatic Speaker Identification Shimaa Ahmed Yash R. Wani Ali Shahin Shamsabadi Mohammad Yaghini Ilia Shumailov Nicolas Papernot Kassem Fawaz AAML 46 4 0 06 Feb 2022
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs Songxiang Liu Dan Su Dong Yu DiffM 75 65 0 28 Jan 2022
Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition M. Soleymanpour Michael T. Johnson Rahim Soleymanpour J. Berry 42 28 0 27 Jan 2022
Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention Artem Gorodetskii Ivan Ozhiganov 30 2 0 25 Jan 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer Xiaochun An Frank Soong Lei Xie 75 18 0 24 Jan 2022
Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training J. Yang Lei He 36 11 0 20 Jan 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis Yu Wang Xinsheng Wang Pengcheng Zhu Jie Wu Hanzhao Li Heyang Xue Yongmao Zhang Lei Xie Mengxiao Bi 25 97 0 19 Jan 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis Yinjiao Lei Shan Yang Xinsheng Wang Lei Xie 32 73 0 17 Jan 2022
Emotion Intensity and its Control for Emotional Voice Conversion Kun Zhou Berrak Sisman R. Rana Björn W. Schuller Haizhou Li 70 54 0 10 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound Rowan Zellers Jiasen Lu Ximing Lu Youngjae Yu Yanpeng Zhao Mohammadreza Salehi Aditya Kusupati Jack Hessel Ali Farhadi Yejin Choi 48 207 0 07 Jan 2022
Audio representations for deep learning in sound synthesis: A review Anastasia Natsiou Seán O'Leary AI4TS 30 18 0 07 Jan 2022
A sinusoidal signal reconstruction method for the inversion of the mel-spectrogram Anastasia Natsiou Seán O'Leary 25 3 0 07 Jan 2022
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios Qicong Xie Tao Li Xinsheng Wang Zhichao Wang Lei Xie Guoqiao Yu Guanglu Wan 37 11 0 23 Dec 2021
Textless Speech-to-Speech Translation on Real Data Ann Lee Hongyu Gong Paul-Ambroise Duquenne Holger Schwenk Peng-Jen Chen ... Sravya Popuri Yossi Adi J. Pino Jiatao Gu Wei-Ning Hsu 31 143 0 15 Dec 2021
VocBench: A Neural Vocoder Benchmark for Speech Synthesis Ehab A. AlBadawy Andrew Gibiansky Qing He Jilong Wu Ming-Ching Chang Siwei Lyu 32 12 0 06 Dec 2021
V2C: Visual Voice Cloning Qi Chen Yuanqing Li Yuankai Qi Jiaqiu Zhou Mingkui Tan Qi Wu VGen 33 24 0 25 Nov 2021
Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance Heeseung Kim Sungwon Kim Sungroh Yoon DiffM BDL 19 107 0 23 Nov 2021