Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1910.10288
Cited By
Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
23 October 2019
Eric Battenberg
RJ Skerry-Ryan
Soroosh Mariooryad
Daisy Stanton
David Kao
Matt Shannon
Tom Bagby
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis"
50 / 76 papers shown
Title
Towards Film-Making Production Dialogue, Narration, Monologue Adaptive Moving Dubbing Benchmarks
Chaoyi Wang
Junjie Zheng
Zihao Chen
Shiyu Xia
Chaofan Ding
Xiaohao Zhang
Xi Tao
Xiaoming He
Xinhan Di
AuLLM
147
0
0
30 Apr 2025
DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance
Junjie Zheng
Zihao Chen
Chaofan Ding
Xinhan Di
VGen
75
1
0
31 Mar 2025
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
Haomin Zhang
Chang Liu
Junjie Zheng
Zihao Chen
Chaofan Ding
Xinhan Di
DiffM
VGen
88
0
0
28 Mar 2025
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Gaoxiang Cong
Jiadong Pan
Liang-Sheng Li
Yuankai Qi
Yuxin Peng
Anton Van Den Hengel
Jian Yang
Qingming Huang
92
6
0
12 Dec 2024
RELATE: A Modern Processing Platform for Romanian Language
V. Pais
Radu Ion
Andrei-Marius Avram
Maria Mitrofan
D. Tufis
VLM
21
0
0
29 Oct 2024
Diffuse or Confuse: A Diffusion Deepfake Speech Dataset
Anton Firc
K. Malinka
P. Hanáček
DiffM
36
0
0
09 Oct 2024
Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning
Siqi Sun
Korin Richmond
40
0
0
15 Sep 2024
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
Bing Han
Long Zhou
Shujie Liu
Sanyuan Chen
Lingwei Meng
Yanming Qian
Yanqing Liu
Sheng Zhao
Jinyu Li
Furu Wei
49
14
0
12 Jun 2024
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Leying Zhang
Yao Qian
Long Zhou
Shujie Liu
Dongmei Wang
...
Yanmin Qian
Jinyu Li
Lei He
Sheng Zhao
Michael Zeng
34
1
0
10 Apr 2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
Philip Anastassiou
Zhenyu Tang
Kainan Peng
Dongya Jia
Jiaxin Li
Ming Tu
Yuping Wang
Yuxuan Wang
Mingbo Ma
42
4
0
10 Apr 2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Mateusz Lajszczak
Guillermo Cámbara
Yang Li
Fatih Beyhan
Arent van Korlaar
...
Bartosz Putrycz
Soledad López Gambino
Kayeon Yoo
Elena Sokolova
Thomas Drugman
LM&MA
38
72
0
12 Feb 2024
Energy-Based Models For Speech Synthesis
Wanli Sun
Zehai Tu
Anton Ragni
DiffM
26
0
0
19 Oct 2023
On granularity of prosodic representations in expressive text-to-speech
Mikolaj Babianski
Kamil Pokora
Raahil Shah
Rafał Sienkiewicz
Daniel Korzekwa
V. Klimkov
21
5
0
26 Jan 2023
UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion
Hao Liu
Tao Wang
Ruibo Fu
Jiangyan Yi
Zhengqi Wen
J. Tao
20
3
0
10 Jan 2023
Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features
Junhui Zhang
Junjie Pan
Xiang Yin
Zejun Ma
27
0
0
12 Dec 2022
Learning to Dub Movies via Hierarchical Prosody Models
Gaoxiang Cong
Liang Li
Yuankai Qi
Zhengjun Zha
Qi Wu
Wen-yu Wang
Bin Jiang
Ming Yang
Qin Huang
75
25
0
08 Dec 2022
Learning the joint distribution of two sequences using little or no paired data
Soroosh Mariooryad
Matt Shannon
Siyuan Ma
Tom Bagby
David Kao
Daisy Stanton
Eric Battenberg
RJ Skerry-Ryan
24
2
0
06 Dec 2022
Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Nikolaos Ellinas
Myrsini Christidou
Alexandra Vioni
June Sig Sung
Aimilios Chalamandaris
Pirros Tsiakoulis
P. Mastorocostas
17
7
0
29 Nov 2022
EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models
Perry Lam
Huayun Zhang
Nancy F. Chen
Berrak Sisman
19
2
0
22 Sep 2022
ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS
Liumeng Xue
Frank Soong
Shaofei Zhang
Linfu Xie
27
23
0
14 Sep 2022
Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text
Yoonhyung Lee
Seunghyun Yoon
Kyomin Jung
8
21
0
26 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Zhengxi Liu
Qiao Tian
Chenxu Hu
Xudong Liu
Meng-Che Wu
Yuping Wang
Hang Zhao
Yuxuan Wang
30
10
0
13 Jul 2022
R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS
Kyle Kastner
Aaron Courville
35
0
0
30 Jun 2022
Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss
Efthymios Georgiou
Kosmas Kritsis
Georgios Paraskevopoulos
Athanasios Katsamanis
V. Katsouros
Alexandros Potamianos
18
3
0
28 Apr 2022
Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech
Hyungchan Yoon
Seyun Um
Changwhan Kim
Hong-Goo Kang
25
0
0
05 Apr 2022
Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention
Artem Gorodetskii
Ivan Ozhiganov
20
2
0
25 Jan 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis
Yinjiao Lei
Shan Yang
Xinsheng Wang
Lei Xie
22
73
0
17 Jan 2022
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios
Qicong Xie
Tao Li
Xinsheng Wang
Zhichao Wang
Lei Xie
Guoqiao Yu
Guanglu Wan
27
11
0
23 Dec 2021
V2C: Visual Voice Cloning
Qi Chen
Yuanqing Li
Yuankai Qi
Jiaqiu Zhou
Mingkui Tan
Qi Wu
VGen
33
23
0
25 Nov 2021
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis
Alexandra Vioni
Myrsini Christidou
Nikolaos Ellinas
G. Vamvoukakis
Panos Kakoulidis
Taehoon Kim
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
11
11
0
19 Nov 2021
High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Aimilios Chalamandaris
Georgia Maniati
Panos Kakoulidis
S. Raptis
June Sig Sung
Hyoungmin Park
Pirros Tsiakoulis
11
36
0
17 Nov 2021
Speaker Generation
Daisy Stanton
Matt Shannon
Soroosh Mariooryad
RJ Skerry-Ryan
Eric Battenberg
Tom Bagby
David Kao
17
29
0
07 Nov 2021
VRAIN-UPV MLLP's system for the Blizzard Challenge 2021
A. P. D. Martos
A. Sanchís
Alfons Juan-Císcar
13
6
0
29 Oct 2021
Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation
Fengyu Yang
Jian Luan
Yujun Wang
32
5
0
19 Oct 2021
PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control
Yunchao He
Jian Luan
Yujun Wang
22
1
0
09 Oct 2021
Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS
Shilu Lin
Wenchao Su
Li Meng
Fenglong Xie
Xinhui Li
Li Lu
29
4
0
28 Sep 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Linfu Xie
21
42
0
14 Sep 2021
Modeling Concentrated Cross-Attention for Neural Machine Translation with Gaussian Mixture Model
Shaolei Zhang
Yang Feng
10
23
0
11 Sep 2021
Integrated Speech and Gesture Synthesis
Siyang Wang
Simon Alexanderson
Joakim Gustafson
Jonas Beskow
G. Henter
Éva Székely
37
19
0
25 Aug 2021
One TTS Alignment To Rule Them All
Rohan Badlani
A. Lancucki
Kevin J. Shih
Rafael Valle
Ming-Yu Liu
Bryan Catanzaro
30
82
0
23 Aug 2021
Enhancing audio quality for expressive Neural Text-to-Speech
Abdelhamid Ezzerg
Adam Gabry's
Bartosz Putrycz
Daniel Korzekwa
Daniel Sáez-Trigueros
David McHardy
Kamil Pokora
Jakub Lachowicz
Jaime Lorenzo-Trueba
V. Klimkov
12
6
0
13 Aug 2021
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person
Xinsheng Wang
Qicong Xie
Jihua Zhu
Lei Xie
O. Scharenborg
31
16
0
09 Aug 2021
Translatotron 2: High-quality direct speech-to-speech translation with voice preservation
Ye Jia
Michelle Tadmor Ramanovich
Tal Remez
Roi Pomerantz
26
67
0
19 Jul 2021
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
18
352
0
29 Jun 2021
Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech
Raahil Shah
Kamil Pokora
Abdelhamid Ezzerg
V. Klimkov
Goeric Huybrechts
Bartosz Putrycz
Daniel Korzekwa
Thomas Merritt
30
25
0
24 Jun 2021
Controllable Context-aware Conversational Speech Synthesis
Jian Cong
Shan Yang
Na Hu
Guangzhi Li
Lei Xie
Dan Su
20
30
0
21 Jun 2021
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
26
24
0
20 Apr 2021
FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion
Hirokazu Kameoka
Kou Tanaka
Takuhiro Kaneko
33
21
0
14 Apr 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Dan Su
42
22
0
12 Feb 2021
Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet
Shilu Lin
Fenglong Xie
Li Meng
Xinhui Li
Li Lu
11
0
0
30 Jan 2021
1
2
Next