Title
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis Alexandra Vioni Myrsini Christidou Nikolaos Ellinas G. Vamvoukakis Panos Kakoulidis Taehoon Kim June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 19 11 0 19 Nov 2021
Word-Level Style Control for Expressive, Non-attentive Speech Synthesis Konstantinos Klapsas Nikolaos Ellinas June Sig Sung Hyoungmin Park S. Raptis 30 9 0 19 Nov 2021
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control Myrsini Christidou Alexandra Vioni Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Panos Kakoulidis June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 21 4 0 19 Nov 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech Michael Hassid Michelle Tadmor Ramanovich Brendan Shillingford Miaosen Wang Ye Jia Tal Remez DiffM 27 17 0 19 Nov 2021
Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control K. Markopoulos Nikolaos Ellinas Alexandra Vioni Myrsini Christidou Panos Kakoulidis ... Georgia Maniati June Sig Sung Hyoungmin Park Pirros Tsiakoulis Aimilios Chalamandaris 16 2 0 17 Nov 2021
Cross-lingual Low Resource Speaker Adaptation Using Phonological Features Georgia Maniati Nikolaos Ellinas K. Markopoulos G. Vamvoukakis June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 8 14 0 17 Nov 2021
High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Aimilios Chalamandaris Georgia Maniati Panos Kakoulidis S. Raptis June Sig Sung Hyoungmin Park Pirros Tsiakoulis 22 36 0 17 Nov 2021
Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning Songxiang Liu Dan Su Dong Yu 25 10 0 14 Nov 2021
AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion Damien Ronssin Milos Cernak 28 10 0 12 Nov 2021
Speaker Generation Daisy Stanton Matt Shannon Soroosh Mariooryad RJ Skerry-Ryan Eric Battenberg Tom Bagby David Kao 28 29 0 07 Nov 2021
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech Sung-Feng Huang Chyi-Jiunn Lin Da-Rong Liu Yi-Chen Chen Hung-yi Lee 22 56 0 07 Nov 2021
Emotional Prosody Control for Speech Generation S. Sivaprasad Saiteja Kosgi Vineet Gandhi 12 17 0 07 Nov 2021
Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity Peter Wu Jiatong Shi Yifan Zhong Shinji Watanabe A. Black 27 8 0 02 Nov 2021
Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units Anurag Katakkar A. Black AuLLM 30 1 0 31 Oct 2021
VRAIN-UPV MLLP's system for the Blizzard Challenge 2021 A. P. D. Martos Albert Sanchis Alfons Juan-Císcar 19 6 0 29 Oct 2021
Beyond $L_p$ clipping: Equalization-based Psychoacoustic Attacks against ASRs H. Abdullah Muhammad Sajidur Rahman Christian Peeters Cassidy Gibson Washington Garcia Vincent Bindschaedler T. Shrimpton Patrick Traynor AAML 19 9 0 25 Oct 2021
FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection Zhenyu Zhang Yewei Gu Xiaowei Yi Xianfeng Zhao 34 24 0 18 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts Chenxu Hu Qiao Tian Tingle Li Yuping Wang Yuxuan Wang Hang Zhao DiffM VGen 36 40 0 15 Oct 2021
From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation Danni Liu Changhan Wang Hongyu Gong Xutai Ma Yun Tang J. Pino 27 4 0 15 Oct 2021
ESPnet2-TTS: Extending the Edge of TTS Research Tomoki Hayashi Ryuichi Yamamoto Takenori Yoshimura Peter Wu Jiatong Shi Takaaki Saeki Yooncheol Ju Yusuke Yasuda Shinnosuke Takamichi Shinji Watanabe VLM 55 60 0 15 Oct 2021
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data Haitong Zhang Yue Lin 26 0 0 14 Oct 2021
A Melody-Unsupervision Model for Singing Voice Synthesis Soonbeom Choi Juhan Nam 29 14 0 13 Oct 2021
Fine-grained style control in Transformer-based Text-to-speech Synthesis Li-Wei Chen Alexander I. Rudnicky 88 30 0 12 Oct 2021
S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations Wen-Chin Huang Shu-Wen Yang Tomoki Hayashi Hung-yi Lee Shinji Watanabe Tomoki Toda 38 40 0 12 Oct 2021
Adapting TTS models For New Speakers using Transfer Learning Paarth Neekhara Jason Chun Lok Li Boris Ginsburg 38 15 0 12 Oct 2021
LaughNet: synthesizing laughter utterances from waveform silhouettes and a single laughter example Hieu-Thi Luong Junichi Yamagishi 52 9 0 11 Oct 2021
Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding Chao Wang Zhonghao Li Benlai Tang Xiang Yin Yuan Wan Yibiao Yu Zejun Ma 29 17 0 10 Oct 2021
PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control Yunchao He Jian Luan Yujun Wang 30 1 0 09 Oct 2021
Using multiple reference audios and style embedding constraints for speech synthesis Cheng Gong Longbiao Wang Zhenhua Ling Ju Zhang J. Dang 21 5 0 09 Oct 2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech Pengfei Wu Junjie Pan Chenchang Xu Junhui Zhang Lin Wu Xiang Yin Zejun Ma 18 16 0 08 Oct 2021
KaraSinger: Score-Free Singing Voice Synthesis with VQ-VAE using Mel-spectrograms Chien-Feng Liao Jen-Yu Liu Yi-Hsuan Yang 29 5 0 08 Oct 2021
A study on the efficacy of model pre-training in developing neural text-to-speech system Guangyan Zhang Yichong Leng Daxin Tan Ying Qin Kaitao Song Xu Tan Sheng Zhao Tan Lee 27 2 0 08 Oct 2021
Voice Reenactment with F0 and timing constraints and adversarial learning of conversions F. Bous L. Benaroya Nicolas Obin Axel Roebel 24 2 0 07 Oct 2021
Cloning one's voice using very limited data in the wild Dongyang Dai Yuan-Jui Chen Li Chen Ming Tu Lu Liu Rui Xia Qiao Tian Yuping Wang Yuxuan Wang SyDa 33 9 0 07 Oct 2021
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over Junchen Lu Berrak Sisman Rui Liu Mingyang Zhang Haizhou Li DiffM 41 19 0 07 Oct 2021
PortaSpeech: Portable and High-Quality Generative Text-to-Speech Yi Ren Jinglin Liu Zhou Zhao 47 78 0 30 Sep 2021
Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS Shilu Lin Wenchao Su Li Meng Fenglong Xie Xinhui Li Li Lu 37 4 0 28 Sep 2021
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network Takaaki Saeki Shinnosuke Takamichi Hiroshi Saruwatari 36 3 0 22 Sep 2021
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World Emily Wenger Max Bronckers Christian Cianfarani Jenna Cryan Angela Sha Haitao Zheng Ben Y. Zhao AAML 45 39 0 20 Sep 2021
On-device neural speech synthesis Sivanand Achanta Albert Antony L. Golipour Jiangchuan Li T. Raitio ... Francesco Rossi Jennifer Shi Jaimin Upadhyay David Winarsky Hepeng Zhang 40 17 0 17 Sep 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis Tao Li Xinsheng Wang Qicong Xie Zhichao Wang Linfu Xie 34 42 0 14 Sep 2021
Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration Chuanxin Tang Chong Luo Zhiyuan Zhao Dacheng Yin Yucheng Zhao Wenjun Zeng 24 9 0 12 Sep 2021
Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis Songxiang Liu Shan Yang Dan Su Dong Yu AI4TS 35 10 0 08 Sep 2021
Text-Free Prosody-Aware Generative Spoken Language Modeling Eugene Kharitonov Ann Lee Adam Polyak Yossi Adi Jade Copet ... Tu Nguyen M. Rivière Abdel-rahman Mohamed Emmanuel Dupoux Wei-Ning Hsu 37 117 0 07 Sep 2021
Evaluation of an Audio-Video Multimodal Deepfake Dataset using Unimodal and Multimodal Detectors Hasam Khalid Minhan Kim Shahroz Tariq Simon S. Woo 36 83 0 07 Sep 2021
Neural HMMs are all you need (for high-quality attention-free TTS) Shivam Mehta Éva Székely Jonas Beskow G. Henter 40 18 0 30 Aug 2021
Integrated Speech and Gesture Synthesis Siyang Wang Simon Alexanderson Joakim Gustafson Jonas Beskow G. Henter Éva Székely 37 19 0 25 Aug 2021
Fighting Game Commentator with Pitch and Loudness Adjustment Utilizing Highlight Cues Junjie H. Xu Zhou Fang Qihang Chen Satoru Ohno Pujana Paliyawan 30 4 0 18 Aug 2021
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints Ji-Hoon Kim Sang-Hoon Lee Ji-Hyun Lee Hong G Jung Seong-Whan Lee 47 6 0 16 Aug 2021
RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform Youxuan Ma Zongze Ren Shugong Xu 48 39 0 12 Aug 2021