Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal
Latent Representations

v1v2v3 (latest)

Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations

26 July 2021

ArXiv (abs)PDF HTML

Papers citing "Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations"

17 / 17 papers shown

Title
Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis Radek Daněček Carolin Schmitt Senya Polikovsky Michael J. Black 98 1 0 18 Apr 2025
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion Disong Wang Shan Yang Dan Su Xunying Liu Dong Yu Helen Meng 55 11 0 18 Feb 2022
Deep Learning Based Assessment of Synthetic Speech Naturalness Gabriel Mittag Sebastian Möller 58 64 0 23 Apr 2021
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong Jaehyeon Kim Jaekyoung Bae 177 1,936 0 12 Oct 2020
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis Prajwal K R Rudrabha Mukhopadhyay Vinay P. Namboodiri C. V. Jawahar 63 113 0 17 May 2020
Vocoder-Based Speech Synthesis from Silent Videos Daniel Michelsanti Olga Slizovskaia G. Haro Emilia Gómez Zheng-Hua Tan Jesper Jensen 69 31 0 06 Apr 2020
Emotional speech synthesis with rich and granularized control Seyun Um Sangshin Oh Kyungguen Byun Inseon Jang C. Ahn Hong-Goo Kang 54 90 0 05 Nov 2019
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens Rafael Valle Jason Chun Lok Li R. Prenger Bryan Catanzaro 72 149 0 26 Oct 2019
Lipper: Synthesizing Thy Speech using Multi-View Lipreading Yaman Kumar Singla Rohit Jain Khwaja Mohd. Salik R. Shah Yifang Yin Roger Zimmermann 90 41 0 28 Jun 2019
Video-Driven Speech Reconstruction using Generative Adversarial Networks Konstantinos Vougioukas Pingchuan Ma Stavros Petridis Maja Pantic GAN 56 49 0 14 Jun 2019
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning Ming-Yu Liu Kainan Peng Andrew Gibiansky Sercan O. Arik Ajay Kannan Sharan Narang Jonathan Raiman John Miller 69 308 0 20 Oct 2017
Tacotron: Towards End-to-End Speech Synthesis Yuxuan Wang RJ Skerry-Ryan Daisy Stanton Yonghui Wu Ron J. Weiss ... Samy Bengio Quoc V. Le Yannis Agiomyrgiannakis R. Clark Rif A. Saurous 160 1,826 0 29 Mar 2017
Vid2speech: Speech Reconstruction from Silent Video Ariel Ephrat Shmuel Peleg 90 123 0 02 Jan 2017
Lip Reading Sentences in the Wild Joon Son Chung A. Senior Oriol Vinyals Andrew Zisserman 261 790 0 16 Nov 2016
LipNet: End-to-End Sentence-level Lipreading Yannis Assael Brendan Shillingford Shimon Whiteson Nando de Freitas 82 397 0 05 Nov 2016
Listen, Attend and Spell William Chan Navdeep Jaitly Quoc V. Le Oriol Vinyals RALM 156 2,266 0 05 Aug 2015
FaceNet: A Unified Embedding for Face Recognition and Clustering Florian Schroff Dmitry Kalenichenko James Philbin 3DH 382 13,145 0 12 Mar 2015