Title
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers Chengyi Wang Sanyuan Chen Yu-Huan Wu Zi-Hua Zhang Long Zhou ... Huaming Wang Jinyu Li Lei He Sheng Zhao Furu Wei 199 727 0 05 Jan 2023
Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling Amitay Sicherman Yossi Adi 95 37 0 02 Jan 2023
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models Yinghao Aaron Li Cong Han N. Mesgarani 85 19 0 29 Dec 2022
Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism Yukiya Hono Kei Hashimoto Yoshihiko Nankaku K. Tokuda 71 2 0 28 Dec 2022
Source Tracing: Detecting Voice Spoofing Tinglong Zhu Xingming Wang Xiaoyi Qin Ming Li 65 18 0 16 Dec 2022
Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder Yusuke Yasuda Tomoki Toda DiffM 79 8 0 16 Dec 2022
Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language Yusuke Yasuda Tomoki Toda 121 10 0 16 Dec 2022
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis Shinhyeok Oh HyeongRae Noh Yoonseok Hong Insoo Oh 80 0 0 15 Dec 2022
Speech and Natural Language Processing Technologies for Pseudo-Pilot Simulator Amrutha Prasad Juan Pablo Zuluaga P. Motlícek Seyyed Saeed Sarfjoo Iuliia Nigmatulina Karel Veselý 61 3 0 14 Dec 2022
Disentangling Prosody Representations with Unsupervised Speech Reconstruction Leyuan Qu Taiha Li C. Weber Theresa Pekarek-Rosin F. Ren S. Wermter 89 10 0 14 Dec 2022
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset Kailin Liang Bin Liu Yifan Hu Rui Liu F. Bao Guanglai Gao 79 1 0 11 Dec 2022
MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis Rishabh Dabral Muhammad Hamza Mughal Vladislav Golyanik Christian Theobalt DiffM VGen 111 183 0 08 Dec 2022
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models Jinze Bai Rui Men Han Yang Xuancheng Ren Kai Dang ... Wenhang Ge Jianxin Ma Junyang Lin Jingren Zhou Chang Zhou 88 16 0 08 Dec 2022
GreenEyes: An Air Quality Evaluating Model based on WaveNet Kan Huang Kai Zhang Ming-de Liu 29 2 0 08 Dec 2022
Learning to Dub Movies via Hierarchical Prosody Models Gaoxiang Cong Liang Li Yuankai Qi Zhengjun Zha Qi Wu Wen-yu Wang Bin Jiang Ming-Hsuan Yang Qin Huang 141 27 0 08 Dec 2022
Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning Ankur Debnath Shridevi S Patil Gangotri Nadiger R. Ganesan 69 21 0 07 Dec 2022
Improve Bilingual TTS Using Dynamic Language and Phonology Embedding Fengyu Yang Jian Luan Yujun Wang 50 1 0 07 Dec 2022
Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue Daxin Tan Nikos Kargas David McHardy C. Papayiannis Antonio Bonafonte Marek Střelec Jonas Rohnke A. Filandras Trevor Wood 53 0 0 07 Dec 2022
Learning the joint distribution of two sequences using little or no paired data Soroosh Mariooryad Matt Shannon Siyuan Ma Tom Bagby David Kao Daisy Stanton Eric Battenberg RJ Skerry-Ryan 89 2 0 06 Dec 2022
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis Yinjiao Lei Shan Yang Xinsheng Wang Qicong Xie Jixun Yao Linfu Xie Jane Polak Scowcroft DiffM 80 9 0 03 Dec 2022
SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech Byoung Jin Choi Myeonghun Jeong Joun Yeop Lee N. Kim 114 13 0 30 Nov 2022
Neural Speech Phase Prediction based on Parallel Estimation Architecture and Anti-Wrapping Losses Yang Ai Zhenhua Ling 64 27 0 29 Nov 2022
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities Amin Azmoodeh Ali Dehghantanha 85 3 0 26 Nov 2022
Contextual Expressive Text-to-Speech Jianhong Tu Zeyu Cui Xiaohuan Zhou Siqi Zheng Kaiqin Hu Ju Fan Chang Zhou 55 3 0 26 Nov 2022
Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices O. Watts Lovisa Wihlborg Cassia Valentini-Botinhao 73 3 0 25 Nov 2022
Efficient Incremental Text-to-Speech on GPUs Muyang Du Chuan Liu Jiaxing Qi Junjie Lai 60 1 0 25 Nov 2022
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems? Xuan Shi Erica Cooper Xin Wang Junichi Yamagishi Shrikanth Narayanan 73 1 0 25 Nov 2022
Prosody-controllable spontaneous TTS with neural HMMs Harm Lameris Shivam Mehta G. Henter Joakim Gustafson Éva Székely 66 15 0 24 Nov 2022
3d human motion generation from the text via gesture action classification and the autoregressive model Gwantae Kim Youngsuk Ryu Junyeop Lee D. Han Jeongmin Bae Hanseok Ko 44 2 0 18 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users Gokul Karthik Kumar V. PraveenS. Pratyush Kumar Mitesh M. Khapra Karthik Nandakumar 95 22 0 17 Nov 2022
Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints Zhichao Wang Xinsheng Wang Linfu Xie Yuan-Jui Chen Qiao Tian Yuping Wang 79 5 0 16 Nov 2022
Improving Speech Emotion Recognition with Unsupervised Speaking Style Transfer Leyuan Qu Wei Wang C. Weber F. Ren Taiha Li S. Wermter 53 1 0 16 Nov 2022
General Intelligence Requires Rethinking Exploration Minqi Jiang Tim Rocktaschel Edward Grefenstette LRM 83 20 0 15 Nov 2022
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing J. Webber Cassia Valentini-Botinhao Evelyn Williams G. Henter Simon King 111 9 0 13 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS Shivam Mehta Ambika Kirkland Harm Lameris Jonas Beskow Éva Székely G. Henter AI4TS 107 13 0 13 Nov 2022
MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy Ya-Jie Zhang Wei Song Ya Yue Zhengchen Zhang Youzheng Wu Xiaodong He 73 7 0 11 Nov 2022
GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant Instance Conditioning Gaku Narita Junichi Shimizu Taketo Akama GAN 82 11 0 10 Nov 2022
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder J. Melechovský Ambuj Mehrish Berrak Sisman Dorien Herremans 83 6 0 07 Nov 2022
Deliberation Networks and How to Train Them Qingyun Dou Mark Gales 65 0 0 06 Nov 2022
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space Jihwan Lee Jaesung Bae Seongkyu Mun Heejin Choi Joun Yeop Lee Hoon-Young Cho Chanwoo Kim 74 2 0 06 Nov 2022
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS Dongchao Yang Songxiang Liu Jianwei Yu Helin Wang Chao Weng Yuexian Zou DiffM VLM 89 18 0 04 Nov 2022
Improving Speech Prosody of Audiobook Text-to-Speech Synthesis with Acoustic and Textual Contexts Detai Xin Sharath Adavanne F. Ang Ashish Kulkarni Shinnosuke Takamichi Hiroshi Saruwatari 105 14 0 04 Nov 2022
Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis Konstantinos Klapsas Karolos Nikitaras Nikolaos Ellinas June Sig Sung Inchul Hwang S. Raptis Aimilios Chalamandaris Pirros Tsiakoulis 70 0 0 02 Nov 2022
SpectroMap: Peak detection algorithm for audio fingerprinting A. López-García 43 0 0 02 Nov 2022
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis Karolos Nikitaras Konstantinos Klapsas Nikolaos Ellinas Georgia Maniati June Sig Sung Inchul Hwang S. Raptis Aimilios Chalamandaris Pirros Tsiakoulis 67 1 0 01 Nov 2022
Generating Multilingual Gender-Ambiguous Text-to-Speech Voices K. Markopoulos Georgia Maniati G. Vamvoukakis Nikolaos Ellinas Georgios Vardaxoglou ... Gunu Jho Inchul Hwang Aimilios Chalamandaris Pirros Tsiakoulis S. Raptis 83 1 0 01 Nov 2022
Waveform Boundary Detection for Partially Spoofed Audio Zexin Cai Weiqing Wang Ming Li 53 28 0 01 Nov 2022
AccentSpeech: Learning Accent from Crowd-sourced Data for Target Speaker TTS with Accents Yongmao Zhang Zhichao Wang Pei-Yin Yang Hongshen Sun Zhisheng Wang Linfu Xie 87 6 0 31 Oct 2022
Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Georgia Maniati Panos Kakoulidis June Sig Sung Inchul Hwang S. Raptis Aimilios Chalamandaris Pirros Tsiakoulis 84 2 0 31 Oct 2022
The Importance of Accurate Alignments in End-to-End Speech Synthesis Anusha Prakash H. Murthy 43 0 0 31 Oct 2022