Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1710.08969
Cited By
Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention
24 October 2017
Hideyuki Tachibana
Katsuya Uenoyama
Shunsuke Aihara
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention"
50 / 57 papers shown
Title
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
58
11
0
25 Jun 2024
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
Pooneh Mousavi
J. Duret
Salah Zaiem
Luca Della Libera
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
42
9
0
15 Jun 2024
A Statistical Analysis of Wasserstein Autoencoders for Intrinsically Low-dimensional Data
Saptarshi Chakraborty
Peter L. Bartlett
44
1
0
24 Feb 2024
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
Ruiqi Li
Rongjie Huang
Lichao Zhang
Jinglin Liu
Zhou Zhao
33
4
0
08 May 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Takaaki Saeki
Soumi Maiti
Xinjian Li
Shinji Watanabe
Shinnosuke Takamichi
Hiroshi Saruwatari
32
17
0
30 Jan 2023
Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism
Yukiya Hono
Kei Hashimoto
Yoshihiko Nankaku
K. Tokuda
16
2
0
28 Dec 2022
Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language
Yusuke Yasuda
T. Toda
33
8
0
16 Dec 2022
Deliberation Networks and How to Train Them
Qingyun Dou
Mark Gales
24
0
0
06 Nov 2022
Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks
Cassia Valentini-Botinhao
M. Ribeiro
O. Watts
Korin Richmond
G. Henter
16
1
0
22 Sep 2022
AutoLV: Automatic Lecture Video Generator
Wen Wang
Yang Song
Sanjay Jha
VGen
21
3
0
19 Sep 2022
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Qibing Bai
Tom Ko
Yu Zhang
27
4
0
03 Aug 2022
Multimodal Emotion Recognition with Modality-Pairwise Unsupervised Contrastive Loss
Riccardo Franceschini
Enrico Fini
Cigdem Beyan
Alessandro Conti
F. Arrigoni
Elisa Ricci
SSL
OffRL
34
16
0
23 Jul 2022
Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting
Hyeon-Kyeong Shin
Hyewon Han
Doyeon Kim
Soo-Whan Chung
Hong-Goo Kang
32
33
0
30 Jun 2022
Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis
Tae-Woo Kim
Minguk Kang
Gyeong-Hoon Lee
AAML
34
6
0
23 Jun 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
44
213
0
09 May 2022
Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch
Hanbin Bae
Young-Sun Joo
22
2
0
12 Apr 2022
Karaoker: Alignment-free singing voice synthesis with speech training data
Panos Kakoulidis
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
June Sig Sung
Gunu Jho
Pirros Tsiakoulis
Aimilios Chalamandaris
12
3
0
08 Apr 2022
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
36
39
0
15 Oct 2021
ESPnet2-TTS: Extending the Edge of TTS Research
Tomoki Hayashi
Ryuichi Yamamoto
Takenori Yoshimura
Peter Wu
Jiatong Shi
Takaaki Saeki
Yooncheol Ju
Yusuke Yasuda
Shinnosuke Takamichi
Shinji Watanabe
VLM
50
60
0
15 Oct 2021
A Melody-Unsupervision Model for Singing Voice Synthesis
Soonbeom Choi
Juhan Nam
29
14
0
13 Oct 2021
Improving Time Series Classification Algorithms Using Octave-Convolutional Layers
Samuel Harford
Fazle Karim
H. Darabi
AI4TS
22
1
0
28 Sep 2021
Exploring Teacher-Student Learning Approach for Multi-lingual Speech-to-Intent Classification
Bidisha Sharma
Maulik C. Madhavi
Xuehao Zhou
Haizhou Li
23
2
0
28 Sep 2021
Neural HMMs are all you need (for high-quality attention-free TTS)
Shivam Mehta
Éva Székely
Jonas Beskow
G. Henter
40
18
0
30 Aug 2021
One TTS Alignment To Rule Them All
Rohan Badlani
A. Lancucki
Kevin J. Shih
Rafael Valle
Ming-Yu Liu
Bryan Catanzaro
38
82
0
23 Aug 2021
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
18
352
0
29 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control
M. Kang
Sungjae Kim
Injung Kim
26
3
0
21 Jun 2021
Speaker disentanglement in video-to-speech conversion
Dan Oneaţă
Adriana Stan
H. Cucu
24
9
0
20 May 2021
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
26
24
0
20 Apr 2021
A study of latent monotonic attention variants
Albert Zeyer
Ralf Schluter
Hermann Ney
24
5
0
30 Mar 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Dan Su
42
22
0
12 Feb 2021
Fake Visual Content Detection Using Two-Stream Convolutional Neural Networks
B. Yousaf
Muhammad Usama
Waqas Sultani
Arif Mahmood
Junaid Qadir
17
8
0
03 Jan 2021
A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions
Shulei Ji
Jing Luo
Xinyu Yang
MGen
13
125
0
13 Nov 2020
Modulated Fusion using Transformer for Linguistic-Acoustic Emotion Recognition
Jean-Benoit Delbrouck
Noé Tits
Stéphane Dupont
33
20
0
05 Oct 2020
Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems
Ravichander Vipperla
Sangjun Park
Kihyun Choo
Samin S. Ishtiaq
Kyoungbo Min
S. Bhattacharya
Abhinav Mehrotra
Alberto Gil C. P. Ramos
Nicholas D. Lane
26
26
0
11 Aug 2020
SpeedySpeech: Efficient Neural Speech Synthesis
Jan Vainer
Ondrej Dusek
18
42
0
09 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning
Berrak Sisman
Junichi Yamagishi
Simon King
Haizhou Li
BDL
41
318
0
09 Aug 2020
Pretraining Techniques for Sequence-to-Sequence Voice Conversion
Wen-Chin Huang
Tomoki Hayashi
Yi-Chiao Wu
Hirokazu Kameoka
T. Toda
27
38
0
07 Aug 2020
DeepSinger: Singing Voice Synthesis with Data Mined From the Web
Yi Ren
Xu Tan
Tao Qin
Jian Luan
Zhou Zhao
Tie-Yan Liu
39
73
0
09 Jul 2020
Prosodic Prominence and Boundaries in Sequence-to-Sequence Speech Synthesis
Antti Suni
Sofoklis Kakouros
M. Vainio
J. Šimko
19
17
0
29 Jun 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer
Mingjian Chen
Xu Tan
Yi Ren
Jin Xu
Hao Sun
Sheng Zhao
Tao Qin
Tie-Yan Liu
21
109
0
08 Jun 2020
Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
AI4TS
19
31
0
20 May 2020
Many-to-Many Voice Transformer Network
Hirokazu Kameoka
Wen-Chin Huang
Kou Tanaka
Takuhiro Kaneko
Nobukatsu Hojo
T. Toda
ViT
30
30
0
18 May 2020
Towards Automatic Face-to-Face Translation
Prajwal K R
Rudrabha Mukhopadhyay
Jerin Philip
Abhishek Jha
Vinay P. Namboodiri
C. V. Jawahar
CVBM
42
172
0
01 Mar 2020
Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis
Jennifer Williams
Joanna Rownicka
P. Oplustil
Simon King
23
25
0
28 Feb 2020
Deep Long Audio Inpainting
Ya-Liang Chang
Kuan-Ying Lee
Po-Yu Wu
Hung-yi Lee
Winston H. Hsu
30
33
0
15 Nov 2019
Teacher-Student Training for Robust Tacotron-based TTS
Rui Liu
Berrak Sisman
Jingdong Li
F. Bao
Guanglai Gao
Haizhou Li
19
38
0
07 Nov 2019
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit
Tomoki Hayashi
Ryuichi Yamamoto
Katsuki Inoue
Takenori Yoshimura
Shinji Watanabe
T. Toda
K. Takeda
Yu Zhang
Xu Tan
VLM
29
201
0
24 Oct 2019
Sequence-to-sequence Singing Synthesis Using the Feed-forward Transformer
Merlijn Blaauw
J. Bonada
27
55
0
22 Oct 2019
A Comparative Study on Transformer vs RNN in Speech Applications
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
Hirofumi Inaguma
...
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
37
716
0
13 Sep 2019
Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
21
8
0
30 Aug 2019
1
2
Next