Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis

23 October 2019

Papers citing "Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis"

50 / 76 papers shown

Title
Towards Film-Making Production Dialogue, Narration, Monologue Adaptive Moving Dubbing Benchmarks Chaoyi Wang Junjie Zheng Zihao Chen Shiyu Xia Chaofan Ding Xiaohao Zhang Xi Tao Xiaoming He Xinhan Di AuLLM 147 0 0 30 Apr 2025
DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance Junjie Zheng Zihao Chen Chaofan Ding Xinhan Di VGen 75 1 0 31 Mar 2025
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation Haomin Zhang Chang Liu Junjie Zheng Zihao Chen Chaofan Ding Xinhan Di DiffM VGen 88 0 0 28 Mar 2025
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing Gaoxiang Cong Jiadong Pan Liang-Sheng Li Yuankai Qi Yuxin Peng Anton Van Den Hengel Jian Yang Qingming Huang 92 6 0 12 Dec 2024
RELATE: A Modern Processing Platform for Romanian Language V. Pais Radu Ion Andrei-Marius Avram Maria Mitrofan D. Tufis VLM 21 0 0 29 Oct 2024
Diffuse or Confuse: A Diffusion Deepfake Speech Dataset Anton Firc K. Malinka P. Hanáček DiffM 36 0 0 09 Oct 2024
Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning Siqi Sun Korin Richmond 40 0 0 15 Sep 2024
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment Bing Han Long Zhou Shujie Liu Sanyuan Chen Lingwei Meng Yanming Qian Yanqing Liu Sheng Zhao Jinyu Li Furu Wei 49 14 0 12 Jun 2024
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations Leying Zhang Yao Qian Long Zhou Shujie Liu Dongmei Wang ... Yanmin Qian Jinyu Li Lei He Sheng Zhao Michael Zeng 34 1 0 10 Apr 2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing Philip Anastassiou Zhenyu Tang Kainan Peng Dongya Jia Jiaxin Li Ming Tu Yuping Wang Yuxuan Wang Mingbo Ma 42 4 0 10 Apr 2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data Mateusz Lajszczak Guillermo Cámbara Yang Li Fatih Beyhan Arent van Korlaar ... Bartosz Putrycz Soledad López Gambino Kayeon Yoo Elena Sokolova Thomas Drugman LM&MA 38 72 0 12 Feb 2024
Energy-Based Models For Speech Synthesis Wanli Sun Zehai Tu Anton Ragni DiffM 26 0 0 19 Oct 2023
On granularity of prosodic representations in expressive text-to-speech Mikolaj Babianski Kamil Pokora Raahil Shah Rafał Sienkiewicz Daniel Korzekwa V. Klimkov 21 5 0 26 Jan 2023
UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion Hao Liu Tao Wang Ruibo Fu Jiangyan Yi Zhengqi Wen J. Tao 20 3 0 10 Jan 2023
Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features Junhui Zhang Junjie Pan Xiang Yin Zejun Ma 27 0 0 12 Dec 2022
Learning to Dub Movies via Hierarchical Prosody Models Gaoxiang Cong Liang Li Yuankai Qi Zhengjun Zha Qi Wu Wen-yu Wang Bin Jiang Ming Yang Qin Huang 75 25 0 08 Dec 2022
Learning the joint distribution of two sequences using little or no paired data Soroosh Mariooryad Matt Shannon Siyuan Ma Tom Bagby David Kao Daisy Stanton Eric Battenberg RJ Skerry-Ryan 24 2 0 06 Dec 2022
Controllable speech synthesis by learning discrete phoneme-level prosodic representations Nikolaos Ellinas Myrsini Christidou Alexandra Vioni June Sig Sung Aimilios Chalamandaris Pirros Tsiakoulis P. Mastorocostas 17 7 0 29 Nov 2022
EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models Perry Lam Huayun Zhang Nancy F. Chen Berrak Sisman 19 2 0 22 Sep 2022
ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS Liumeng Xue Frank Soong Shaofei Zhang Linfu Xie 27 23 0 14 Sep 2022
Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text Yoonhyung Lee Seunghyun Yoon Kyomin Jung 8 21 0 26 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech Zhengxi Liu Qiao Tian Chenxu Hu Xudong Liu Meng-Che Wu Yuping Wang Hang Zhao Yuxuan Wang 30 10 0 13 Jul 2022
R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS Kyle Kastner Aaron Courville 35 0 0 30 Jun 2022
Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss Efthymios Georgiou Kosmas Kritsis Georgios Paraskevopoulos Athanasios Katsamanis V. Katsouros Alexandros Potamianos 18 3 0 28 Apr 2022
Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech Hyungchan Yoon Seyun Um Changwhan Kim Hong-Goo Kang 25 0 0 05 Apr 2022
Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention Artem Gorodetskii Ivan Ozhiganov 20 2 0 25 Jan 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis Yinjiao Lei Shan Yang Xinsheng Wang Lei Xie 22 73 0 17 Jan 2022
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios Qicong Xie Tao Li Xinsheng Wang Zhichao Wang Lei Xie Guoqiao Yu Guanglu Wan 27 11 0 23 Dec 2021
V2C: Visual Voice Cloning Qi Chen Yuanqing Li Yuankai Qi Jiaqiu Zhou Mingkui Tan Qi Wu VGen 33 23 0 25 Nov 2021
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis Alexandra Vioni Myrsini Christidou Nikolaos Ellinas G. Vamvoukakis Panos Kakoulidis Taehoon Kim June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 11 11 0 19 Nov 2021
High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Aimilios Chalamandaris Georgia Maniati Panos Kakoulidis S. Raptis June Sig Sung Hyoungmin Park Pirros Tsiakoulis 11 36 0 17 Nov 2021
Speaker Generation Daisy Stanton Matt Shannon Soroosh Mariooryad RJ Skerry-Ryan Eric Battenberg Tom Bagby David Kao 17 29 0 07 Nov 2021
VRAIN-UPV MLLP's system for the Blizzard Challenge 2021 A. P. D. Martos A. Sanchís Alfons Juan-Císcar 13 6 0 29 Oct 2021
Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation Fengyu Yang Jian Luan Yujun Wang 32 5 0 19 Oct 2021
PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control Yunchao He Jian Luan Yujun Wang 22 1 0 09 Oct 2021
Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS Shilu Lin Wenchao Su Li Meng Fenglong Xie Xinhui Li Li Lu 29 4 0 28 Sep 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis Tao Li Xinsheng Wang Qicong Xie Zhichao Wang Linfu Xie 21 42 0 14 Sep 2021
Modeling Concentrated Cross-Attention for Neural Machine Translation with Gaussian Mixture Model Shaolei Zhang Yang Feng 10 23 0 11 Sep 2021
Integrated Speech and Gesture Synthesis Siyang Wang Simon Alexanderson Joakim Gustafson Jonas Beskow G. Henter Éva Székely 37 19 0 25 Aug 2021
One TTS Alignment To Rule Them All Rohan Badlani A. Lancucki Kevin J. Shih Rafael Valle Ming-Yu Liu Bryan Catanzaro 30 82 0 23 Aug 2021
Enhancing audio quality for expressive Neural Text-to-Speech Abdelhamid Ezzerg Adam Gabry's Bartosz Putrycz Daniel Korzekwa Daniel Sáez-Trigueros David McHardy Kamil Pokora Jakub Lachowicz Jaime Lorenzo-Trueba V. Klimkov 12 6 0 13 Aug 2021
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person Xinsheng Wang Qicong Xie Jihua Zhu Lei Xie O. Scharenborg 31 16 0 09 Aug 2021
Translatotron 2: High-quality direct speech-to-speech translation with voice preservation Ye Jia Michelle Tadmor Ramanovich Tal Remez Roi Pomerantz 26 67 0 19 Jul 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 352 0 29 Jun 2021
Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech Raahil Shah Kamil Pokora Abdelhamid Ezzerg V. Klimkov Goeric Huybrechts Bartosz Putrycz Daniel Korzekwa Thomas Merritt 30 25 0 24 Jun 2021
Controllable Context-aware Conversational Speech Synthesis Jian Cong Shan Yang Na Hu Guangzhi Li Lei Xie Dan Su 20 30 0 21 Jun 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 26 24 0 20 Apr 2021
FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion Hirokazu Kameoka Kou Tanaka Takuhiro Kaneko 33 21 0 14 Apr 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention Peng Liu Yuewen Cao Songxiang Liu Na Hu Guangzhi Li Chao Weng Dan Su 42 22 0 12 Feb 2021
Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet Shilu Lin Fenglong Xie Li Meng Xinhui Li Li Lu 11 0 0 30 Jan 2021