Title
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis Karren D. Yang Dejan Marković Steven Krenn Vasu Agrawal Alexander Richard VGen 89 33 0 31 Mar 2022
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech Guangyan Zhang Kaitao Song Xu Tan Daxin Tan Yuzi Yan ... G. Wang Wei Zhou Tao Qin Tan Lee Sheng Zhao SSL 95 21 0 31 Mar 2022
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy Shuai Guo Jiatong Shi Tao Qian Shinji Watanabe Qin Jin 137 13 0 31 Mar 2022
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis Hubert Siuzdak Piotr Dura Pol van Rijn Nori Jacoby AI4TS 140 30 0 31 Mar 2022
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech D. Lim Sunghee Jung Eesung Kim 95 53 0 31 Mar 2022
NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism Jingbei Li Yi Meng Zhiyong Wu Helen Meng Qiao Tian Yuping Wang Yuxuan Wang 45 21 0 31 Mar 2022
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping Yuma Koizumi Heiga Zen Kohei Yatabe Nanxin Chen M. Bacchiani DiffM 103 49 0 31 Mar 2022
Enhancing Zero-Shot Many to Many Voice Conversion with Self-Attention VAE Ziang Long Yunling Zheng Meng Yu Jack Xin DRL 63 5 0 30 Mar 2022
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition Junrui Ni Liming Wang Heting Gao Kaizhi Qian Yang Zhang Shiyu Chang M. Hasegawa-Johnson 78 25 0 29 Mar 2022
DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning Takaaki Saeki Kentaro Tachibana Ryuichi Yamamoto 60 11 0 29 Mar 2022
Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation Rendi Chevi Radityo Eko Prasojo Alham Fikri Aji Andros Tjandra S. Sakti VLM 60 4 0 29 Mar 2022
ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion Edresson Casanova C. Shulby Alexander Korolev Arnaldo Cândido Júnior A. S. Soares S. Aluísio M. Ponti 153 14 0 29 Mar 2022
Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus Minchan Kim Myeonghun Jeong Byoung Jin Choi Sunghwan Ahn Joun Yeop Lee N. Kim 113 26 0 29 Mar 2022
$Applying Syntax$\unicode{x2013}$Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis$ Applying Syntax $\unicode{x2013}$ Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis Kei Furukawa Takeshi Kishiyama Satoshi Nakamura 23 1 0 29 Mar 2022
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent Yuki Saito Yuto Nishimura Shinnosuke Takamichi Kentaro Tachibana Hiroshi Saruwatari 128 12 0 28 Mar 2022
vTTS: visual-text to speech Yoshifumi Nakano Takaaki Saeki Shinnosuke Takamichi Katsuhito Sudoh Hiroshi Saruwatari 61 4 0 28 Mar 2022
Attacker Attribution of Audio Deepfakes Nicolas Müller Franziska Dieckmann Jennifer Williams 60 15 0 28 Mar 2022
Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge Sangjun Park Kihyun Choo Joohyung Lee A. Porov Konstantin Osipov June Sig Sung 72 6 0 27 Mar 2022
A Neural Vocoder Based Packet Loss Concealment Algorithm Yaofeng Zhou C. Bao 64 2 0 26 Mar 2022
Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion Xintao Zhao Feng Liu Changhe Song Zhiyong Wu Shiyin Kang Deyi Tuo Helen Meng 85 21 0 24 Mar 2022
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis Shunwei Lei Yixuan Zhou Liyang Chen Zhiyong Wu Shiyin Kang Helen Meng 60 12 0 23 Mar 2022
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling Bac Nguyen Fabien Cardinaux Stefan Uhlich 34 2 0 21 Mar 2022
Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise T. Raitio Petko N. Petkov Jiangchuan Li M. Shifas Andrea Davis Y. Stylianou 48 2 0 20 Mar 2022
ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis Jinlong Xue Yayue Deng Yichen Han Ya Li Jianqing Sun Jiaen Liang 58 8 0 20 Mar 2022
AdaVocoder: Adaptive Vocoder for Custom Voice Xin Yuan Yongbin Feng Mingming Ye Cheng Tuo Minghang Zhang 133 3 0 18 Mar 2022
Improve few-shot voice cloning using multi-modal learning Haitong Zhang Yue Lin 51 8 0 18 Mar 2022
A $^3$ T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing Richard He Bai Renjie Zheng Junkun Chen Xintong Li Mingbo Ma Liang Huang 121 53 0 18 Mar 2022
Text-free non-parallel many-to-many voice conversion using normalising flows Thomas Merritt Abdelhamid Ezzerg Piotr Bilinski Magdalena Proszewska Kamil Pokora Roberto Barra-Chicote Daniel Korzekwa 124 15 0 15 Mar 2022
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities Hsiang-Sheng Tsai Heng-Jui Chang Wen-Chin Huang Zili Huang Kushal Lakhotia ... Hsuan-Jui Chen Shang-Wen Li Shinji Watanabe Abdel-rahman Mohamed Hung-yi Lee 93 110 0 14 Mar 2022
Are discrete units necessary for Spoken Language Modeling? Tu Nguyen Benoît Sagot Emmanuel Dupoux 108 26 0 11 Mar 2022
Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken Conversations Ruijie Yan Shuang Peng Haitao Mi Liang Jiang Shihui Yang Yuchi Zhang Jiajun Li Liangrui Peng Yongliang Wang Zujie Wen 44 4 0 08 Mar 2022
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features Florian Lux Ngoc Thang Vu 102 29 0 07 Mar 2022
Variational Auto-Encoder based Mandarin Speech Cloning Qingyu Xing Xiaohan Ma 138 0 0 06 Mar 2022
NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation Tao Wang Ruibo Fu Jiangyan Yi J. Tao Zhengqi Wen 28 2 0 05 Mar 2022
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform Takuhiro Kaneko Kou Tanaka Hirokazu Kameoka Shogo Seki 89 62 0 04 Mar 2022
Audio Self-supervised Learning: A Survey Shuo Liu Adria Mallol-Ragolta Emilia Parada-Cabeleiro Kun Qian Xingshuo Jing Alexander Kathan Bin Hu Bjoern W. Schuller SSL 104 109 0 02 Mar 2022
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS Haohan Guo Hui Lu Xixin Wu Helen Meng 358 7 0 02 Mar 2022
Speaker Adaption with Intuitive Prosodic Features for Statistical Parametric Speech Synthesis Pengyu Cheng Zhenhua Ling 79 3 0 02 Mar 2022
Real time spectrogram inversion on mobile phone Oleg Rybakov Marco Tagliasacchi Yunpeng Li Liyang Jiang Xia Zhang Fadi Biadsy 146 4 0 01 Mar 2022
Revisiting Over-Smoothness in Text to Speech Yi Ren Xu Tan Tao Qin Zhou Zhao Tie-Yan Liu 148 64 0 26 Feb 2022
Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet J. Valin Umut Isik Paris Smaragdis A. Krishnaswamy 62 4 0 22 Feb 2022
Wavebender GAN: An architecture for phonetically meaningful speech manipulation Gustavo Teodoro Döhler Beck Ulme Wennberg Zofia Malisz G. Henter AI4CE 94 8 0 22 Feb 2022
Improving Cross-lingual Speech Synthesis with Triplet Training Scheme Jianhao Ye Hongbin Zhou Zhiba Su Wendi He Kaimeng Ren Lin Li Heng Lu 50 4 0 22 Feb 2022
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing Tao Wang Jiangyan Yi Ruibo Fu J. Tao Zhengqi Wen KELM 79 20 0 21 Feb 2022
It's Raw! Audio Generation with State-Space Models Karan Goel Albert Gu Chris Donahue Christopher Ré 113 195 0 20 Feb 2022
A Review on Methods and Applications in Multimodal Deep Learning Summaira Jabeen Xi Li Muhammad Shoib Amin Abdul Jabbar VLM HAI 75 103 0 18 Feb 2022
ADD 2022: the First Audio Deep Synthesis Detection Challenge Jiangyan Yi Ruibo Fu J. Tao Shuai Nie Haoxin Ma ... Le Xu Zhengqi Wen Haizhou Li Zheng Lian Bin Liu 79 185 0 17 Feb 2022
Singing-Tacotron: Global duration control attention and dynamic filter for End-to-end singing voice synthesis Tao Wang Ruibo Fu Jiangyan Yi J. Tao Zhengqi Wen 49 7 0 16 Feb 2022
ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech Yi Ren Ming Lei Zhiying Huang Shi-Rui Zhang Qian Chen Zhijie Yan Zhou Zhao 96 43 0 16 Feb 2022
textless-lib: a Library for Textless Spoken Language Processing Eugene Kharitonov Jade Copet Kushal Lakhotia Tu Nguyen Paden Tomasello ... A. Elkahky Wei-Ning Hsu Abdel-rahman Mohamed Emmanuel Dupoux Yossi Adi 129 34 0 15 Feb 2022