ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.10447
  4. Cited By
One TTS Alignment To Rule Them All

One TTS Alignment To Rule Them All

23 August 2021
Rohan Badlani
A. Lancucki
Kevin J. Shih
Rafael Valle
Ming-Yu Liu
Bryan Catanzaro
ArXivPDFHTML

Papers citing "One TTS Alignment To Rule Them All"

50 / 50 papers shown
Title
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
61
3
0
07 Feb 2025
MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation
MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation
Haibo Tong
Zhaoyang Wang
Zhengzhang Chen
Haonian Ji
Shi Qiu
...
Peng Xia
Mingyu Ding
Rafael Rafailov
Chelsea Finn
Huaxiu Yao
EGVM
VGen
104
3
0
03 Feb 2025
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
Minki Kang
Wooseok Han
Eunho Yang
CVBM
39
0
0
31 Dec 2024
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
Wooseok Han
Minki Kang
Changhun Kim
Eunho Yang
43
0
0
31 Dec 2024
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Ji-Hoon Kim
Hong-Sun Yang
Yoon-Cheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
BDL
54
0
0
31 Dec 2024
SegINR: Segment-wise Implicit Neural Representation for Sequence
  Alignment in Neural Text-to-Speech
SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech
Minchan Kim
Myeonghun Jeong
Joun Yeop Lee
Nam Soo Kim
32
0
0
07 Oct 2024
E1 TTS: Simple and Fast Non-Autoregressive TTS
E1 TTS: Simple and Fast Non-Autoregressive TTS
Zhijun Liu
Shuai Wang
Pengcheng Zhu
Mengxiao Bi
Haizhou Li
VLM
DiffM
38
3
0
14 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
C. Han
Seokgi Lee
Gyuhyeon Nam
Gyeongsu Chae
DiffM
194
0
0
14 Sep 2024
Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for
  Practical Applications through Low-Effort Data Strategies
Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies
Srija Anand
Praveena Varadhan
Ashwin Sankar
Giri Raju
Mitesh M. Khapra
45
1
0
18 Jul 2024
Speaker- and Text-Independent Estimation of Articulatory Movements and
  Phoneme Alignments from Speech
Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech
Tobias Weise
P. Klumpp
Kubilay Can Demir
Paula Andrea Pérez-Toro
Maria Schuster
E. Noeth
Bjoern Heismann
Andreas Maier
Seung Hee Yang
39
0
0
03 Jul 2024
VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic
  Features
VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features
Tomoki Koriyama
41
0
0
03 Jul 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic
  Alignment
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
58
11
0
25 Jun 2024
An Initial Investigation of Language Adaptation for TTS Systems under
  Low-resource Scenarios
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
Cheng Gong
Erica Cooper
Xin Wang
Chunyu Qiang
Mengzhe Geng
...
Jianwu Dang
Marc Tessier
Aidan Pine
Korin Richmond
Junichi Yamagishi
37
2
0
13 Jun 2024
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with
  Paralanguage
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage
Kyra Wang
Dorien Herremans
40
0
0
13 Jun 2024
Small-E: Small Language Model with Linear Attention for Efficient Speech
  Synthesis
Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis
Théodor Lemerle
Nicolas Obin
Axel Roebel
37
6
0
06 Jun 2024
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis
  Conditioned on Self-supervised Discrete Speech Representations
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
31
21
0
22 Dec 2023
Data Center Audio/Video Intelligence on Device (DAVID) -- An Edge-AI
  Platform for Smart-Toys
Data Center Audio/Video Intelligence on Device (DAVID) -- An Edge-AI Platform for Smart-Toys
Gabriel Cosache
Francisco Salgado
C. Rotariu
George Sterpu
Rishabh Jain
Peter Corcoran
16
0
0
18 Nov 2023
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer
  Learning
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning
Rishabh Jain
Peter Corcoran
28
0
0
07 Nov 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023
C. Veaux
R. Maia
Spyridoula Papendreou
25
1
0
30 Aug 2023
On the Use of Self-Supervised Speech Representations in Spontaneous
  Speech Synthesis
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
50
5
0
11 Jul 2023
Investigating the Utility of Surprisal from Large Language Models for
  Speech Synthesis Prosody
Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody
Sofoklis Kakouros
J. Šimko
M. Vainio
Antti Suni
27
5
0
16 Jun 2023
M2-CTTS: End-to-End Multi-scale Multi-modal Conversational
  Text-to-Speech Synthesis
M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis
Jinlong Xue
Yayue Deng
Fengping Wang
Ya Li
Yingming Gao
J. Tao
Jianqing Sun
Jiaen Liang
21
8
0
03 May 2023
An End-to-End Neural Network for Image-to-Audio Transformation
An End-to-End Neural Network for Image-to-Audio Transformation
Liu Chen
Michael Deisher
Munir Georges
26
3
0
10 Mar 2023
A Comparative Study of Self-Supervised Speech Representations in Read
  and Spontaneous TTS
A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
38
4
0
05 Mar 2023
Automatic Heteronym Resolution Pipeline Using RAD-TTS Aligners
Automatic Heteronym Resolution Pipeline Using RAD-TTS Aligners
Jocelyn Huang
Evelina Bakhturina
Oktai Tatanov
19
0
0
28 Feb 2023
CrossSpeech: Speaker-independent Acoustic Representation for
  Cross-lingual Speech Synthesis
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis
Ji-Hoon Kim
Hongying Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
30
8
0
28 Feb 2023
PITS: Variational Pitch Inference without Fundamental Frequency for
  End-to-End Pitch-controllable TTS
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
Junhyeok Lee
Wonbin Jung
Hyunjae Cho
Jaeyeon Kim
Jaehwan Kim
17
3
0
24 Feb 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS
Multilingual Multiaccented Multispeaker TTS with RADTTS
Rohan Badlani
Rafael Valle
Kevin J. Shih
J. F. Santos
Francesco Ferroni
Bryan Catanzaro
16
6
0
24 Jan 2023
RWEN-TTS: Relation-aware Word Encoding Network for Natural
  Text-to-Speech Synthesis
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis
Shinhyeok Oh
HyeongRae Noh
Yoonseok Hong
Insoo Oh
20
0
0
15 Dec 2022
Towards Building Text-To-Speech Systems for the Next Billion Users
Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
43
18
0
17 Nov 2022
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with
  Diffusion Models
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models
Minki Kang
Dong Min
Sung Ju Hwang
DiffM
25
48
0
17 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS
OverFlow: Putting flows on top of neural transducers for better TTS
Shivam Mehta
Ambika Kirkland
Harm Lameris
Jonas Beskow
Éva Székely
G. Henter
AI4TS
39
12
0
13 Nov 2022
Accented Text-to-Speech Synthesis with a Conditional Variational
  Autoencoder
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
28
6
0
07 Nov 2022
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New
  Speakers
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
Cheng-Ping Hsieh
Subhankar Ghosh
Boris Ginsburg
41
18
0
01 Nov 2022
The Importance of Accurate Alignments in End-to-End Speech Synthesis
The Importance of Accurate Alignments in End-to-End Speech Synthesis
Anusha Prakash
H. Murthy
31
0
0
31 Oct 2022
FCTalker: Fine and Coarse Grained Context Modeling for Expressive
  Conversational Speech Synthesis
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis
Yifan Hu
Rui Liu
Guanglai Gao
Haizhou Li
128
7
0
27 Oct 2022
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Florian Lux
Julia Koch
Ngoc Thang Vu
38
22
0
21 Oct 2022
Deep Speech Synthesis from Articulatory Representations
Deep Speech Synthesis from Articulatory Representations
Peter Wu
Shinji Watanabe
Louis Goldstein
A. Black
Gopala K. Anumanchipalli
39
24
0
13 Sep 2022
Text-driven Emotional Style Control and Cross-speaker Style Transfer in
  Neural TTS
Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS
Yookyung Shin
Younggun Lee
Suhee Jo
Yeongtae Hwang
Taesu Kim
22
14
0
13 Jul 2022
DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech
DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech
Keon Lee
Kyumin Park
Daeyoung Kim
LM&MA
21
42
0
03 Jul 2022
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech
  Insertion
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Dacheng Yin
Chuanxin Tang
Yanqing Liu
Xiaoqiang Wang
Zhiyuan Zhao
Yucheng Zhao
Zhiwei Xiong
Sheng Zhao
Chong Luo
26
12
0
28 Jun 2022
Regotron: Regularizing the Tacotron2 architecture via monotonic
  alignment loss
Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss
Efthymios Georgiou
Kosmas Kritsis
Georgios Paraskevopoulos
Athanasios Katsamanis
Vassilis Katsouros
Alexandros Potamianos
23
3
0
28 Apr 2022
Adversarial Learning of Intermediate Acoustic Feature for End-to-End
  Lightweight Text-to-Speech
Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech
Hyungchan Yoon
Seyun Um
Changwhan Kim
Hong-Goo Kang
25
0
0
05 Apr 2022
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to
  Speech
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
D. Lim
Sunghee Jung
Eesung Kim
19
51
0
31 Mar 2022
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with
  Articulatory Features
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features
Florian Lux
Ngoc Thang Vu
25
29
0
07 Mar 2022
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising
  Diffusion GANs
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
Songxiang Liu
Dan Su
Dong Yu
DiffM
70
65
0
28 Jan 2022
The MSXF TTS System for ICASSP 2022 ADD Challenge
The MSXF TTS System for ICASSP 2022 ADD Challenge
Chunyong Yang
Pengfei Liu
Yanli Chen
Hongbin Wang
Min Liu
13
0
0
27 Jan 2022
Adapting TTS models For New Speakers using Transfer Learning
Adapting TTS models For New Speakers using Transfer Learning
Paarth Neekhara
Jason Chun Lok Li
Boris Ginsburg
38
15
0
12 Oct 2021
Phone-to-audio alignment without text: A Semi-supervised Approach
Phone-to-audio alignment without text: A Semi-supervised Approach
Jian Zhu
Cong Zhang
David Jurgens
37
36
0
08 Oct 2021
EdiTTS: Score-based Editing for Controllable Text-to-Speech
EdiTTS: Score-based Editing for Controllable Text-to-Speech
Jaesung Tae
Hyeongju Kim
Taesu Kim
DiffM
173
39
0
06 Oct 2021
1