One TTS Alignment To Rule Them All

23 August 2021

Papers citing "One TTS Alignment To Rule Them All"

50 / 50 papers shown

Title
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance Shehzeen Samarah Hussain Paarth Neekhara Xuesong Yang Edresson Casanova Subhankar Ghosh Mikyas T. Desta Roy Fejgin Rafael Valle Jason Chun Lok Li 61 3 0 07 Feb 2025
MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation Haibo Tong Zhaoyang Wang Zhengzhang Chen Haonian Ji Shi Qiu ... Peng Xia Mingyu Ding Rafael Rafailov Chelsea Finn Huaxiu Yao EGVM VGen 104 3 0 03 Feb 2025
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping Minki Kang Wooseok Han Eunho Yang CVBM 39 0 0 31 Dec 2024
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting Wooseok Han Minki Kang Changhun Kim Eunho Yang 43 0 0 31 Dec 2024
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation Ji-Hoon Kim Hong-Sun Yang Yoon-Cheol Ju Il-Hwan Kim Byeong-Yeol Kim Joon Son Chung BDL 54 0 0 31 Dec 2024
SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech Minchan Kim Myeonghun Jeong Joun Yeop Lee Nam Soo Kim 32 0 0 07 Oct 2024
E1 TTS: Simple and Fast Non-Autoregressive TTS Zhijun Liu Shuai Wang Pengcheng Zhu Mengxiao Bi Haizhou Li VLM DiffM 38 3 0 14 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation C. Han Seokgi Lee Gyuhyeon Nam Gyeongsu Chae DiffM 194 0 0 14 Sep 2024
Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies Srija Anand Praveena Varadhan Ashwin Sankar Giri Raju Mitesh M. Khapra 45 1 0 18 Jul 2024
Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech Tobias Weise P. Klumpp Kubilay Can Demir Paula Andrea Pérez-Toro Maria Schuster E. Noeth Bjoern Heismann Andreas Maier Seung Hee Yang 39 0 0 03 Jul 2024
VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features Tomoki Koriyama 41 0 0 03 Jul 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment Paarth Neekhara Shehzeen Samarah Hussain Subhankar Ghosh Jason Chun Lok Li Rafael Valle Rohan Badlani Boris Ginsburg 58 11 0 25 Jun 2024
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios Cheng Gong Erica Cooper Xin Wang Chunyu Qiang Mengzhe Geng ... Jianwu Dang Marc Tessier Aidan Pine Korin Richmond Junichi Yamagishi 37 2 0 13 Jun 2024
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage Kyra Wang Dorien Herremans 40 0 0 13 Jun 2024
Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis Théodor Lemerle Nicolas Obin Axel Roebel 37 6 0 06 Jun 2024
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations Cheng Gong Xin Wang Erica Cooper Dan Wells Longbiao Wang Jianwu Dang Korin Richmond Junichi Yamagishi 31 21 0 22 Dec 2023
Data Center Audio/Video Intelligence on Device (DAVID) -- An Edge-AI Platform for Smart-Toys Gabriel Cosache Francisco Salgado C. Rotariu George Sterpu Rishabh Jain Peter Corcoran 16 0 0 18 Nov 2023
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning Rishabh Jain Peter Corcoran 28 0 0 07 Nov 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023 C. Veaux R. Maia Spyridoula Papendreou 25 1 0 30 Aug 2023
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis Siyang Wang G. Henter Joakim Gustafson Éva Székely 50 5 0 11 Jul 2023
Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody Sofoklis Kakouros J. Šimko M. Vainio Antti Suni 27 5 0 16 Jun 2023
M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis Jinlong Xue Yayue Deng Fengping Wang Ya Li Yingming Gao J. Tao Jianqing Sun Jiaen Liang 21 8 0 03 May 2023
An End-to-End Neural Network for Image-to-Audio Transformation Liu Chen Michael Deisher Munir Georges 26 3 0 10 Mar 2023
A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS Siyang Wang G. Henter Joakim Gustafson Éva Székely 38 4 0 05 Mar 2023
Automatic Heteronym Resolution Pipeline Using RAD-TTS Aligners Jocelyn Huang Evelina Bakhturina Oktai Tatanov 19 0 0 28 Feb 2023
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis Ji-Hoon Kim Hongying Yang Yooncheol Ju Il-Hwan Kim Byeong-Yeol Kim 30 8 0 28 Feb 2023
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS Junhyeok Lee Wonbin Jung Hyunjae Cho Jaeyeon Kim Jaehwan Kim 17 3 0 24 Feb 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS Rohan Badlani Rafael Valle Kevin J. Shih J. F. Santos Francesco Ferroni Bryan Catanzaro 16 6 0 24 Jan 2023
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis Shinhyeok Oh HyeongRae Noh Yoonseok Hong Insoo Oh 20 0 0 15 Dec 2022
Towards Building Text-To-Speech Systems for the Next Billion Users Gokul Karthik Kumar V. PraveenS. Pratyush Kumar Mitesh M. Khapra Karthik Nandakumar 43 18 0 17 Nov 2022
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models Minki Kang Dong Min Sung Ju Hwang DiffM 25 48 0 17 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS Shivam Mehta Ambika Kirkland Harm Lameris Jonas Beskow Éva Székely G. Henter AI4TS 39 12 0 13 Nov 2022
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder J. Melechovský Ambuj Mehrish Berrak Sisman Dorien Herremans 28 6 0 07 Nov 2022
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers Cheng-Ping Hsieh Subhankar Ghosh Boris Ginsburg 41 18 0 01 Nov 2022
The Importance of Accurate Alignments in End-to-End Speech Synthesis Anusha Prakash H. Murthy 31 0 0 31 Oct 2022
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis Yifan Hu Rui Liu Guanglai Gao Haizhou Li 128 7 0 27 Oct 2022
Low-Resource Multilingual and Zero-Shot Multispeaker TTS Florian Lux Julia Koch Ngoc Thang Vu 38 22 0 21 Oct 2022
Deep Speech Synthesis from Articulatory Representations Peter Wu Shinji Watanabe Louis Goldstein A. Black Gopala K. Anumanchipalli 39 24 0 13 Sep 2022
Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS Yookyung Shin Younggun Lee Suhee Jo Yeongtae Hwang Taesu Kim 22 14 0 13 Jul 2022
DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech Keon Lee Kyumin Park Daeyoung Kim LM&MA 21 42 0 03 Jul 2022
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion Dacheng Yin Chuanxin Tang Yanqing Liu Xiaoqiang Wang Zhiyuan Zhao Yucheng Zhao Zhiwei Xiong Sheng Zhao Chong Luo 26 12 0 28 Jun 2022
Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss Efthymios Georgiou Kosmas Kritsis Georgios Paraskevopoulos Athanasios Katsamanis Vassilis Katsouros Alexandros Potamianos 23 3 0 28 Apr 2022
Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech Hyungchan Yoon Seyun Um Changwhan Kim Hong-Goo Kang 25 0 0 05 Apr 2022
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech D. Lim Sunghee Jung Eesung Kim 19 51 0 31 Mar 2022
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features Florian Lux Ngoc Thang Vu 25 29 0 07 Mar 2022
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs Songxiang Liu Dan Su Dong Yu DiffM 70 65 0 28 Jan 2022
The MSXF TTS System for ICASSP 2022 ADD Challenge Chunyong Yang Pengfei Liu Yanli Chen Hongbin Wang Min Liu 13 0 0 27 Jan 2022
Adapting TTS models For New Speakers using Transfer Learning Paarth Neekhara Jason Chun Lok Li Boris Ginsburg 38 15 0 12 Oct 2021
Phone-to-audio alignment without text: A Semi-supervised Approach Jian Zhu Cong Zhang David Jurgens 37 36 0 08 Oct 2021
EdiTTS: Score-based Editing for Controllable Text-to-Speech Jaesung Tae Hyeongju Kim Taesu Kim DiffM 173 39 0 06 Oct 2021