ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.12003
  4. Cited By
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal
  Latent Representations
v1v2v3 (latest)

Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations

26 July 2021
Seyun Um
Jihyun Kim
Jihyun Lee
Hong-Goo Kang
    CVBM
ArXiv (abs)PDFHTML

Papers citing "Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations"

17 / 17 papers shown
Title
Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis
Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis
Radek Daněček
Carolin Schmitt
Senya Polikovsky
Michael J. Black
98
1
0
18 Apr 2025
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge
  transfer from voice conversion
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion
Disong Wang
Shan Yang
Dan Su
Xunying Liu
Dong Yu
Helen Meng
55
11
0
18 Feb 2022
Deep Learning Based Assessment of Synthetic Speech Naturalness
Deep Learning Based Assessment of Synthetic Speech Naturalness
Gabriel Mittag
Sebastian Möller
58
64
0
23 Apr 2021
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
177
1,936
0
12 Oct 2020
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
63
113
0
17 May 2020
Vocoder-Based Speech Synthesis from Silent Videos
Vocoder-Based Speech Synthesis from Silent Videos
Daniel Michelsanti
Olga Slizovskaia
G. Haro
Emilia Gómez
Zheng-Hua Tan
Jesper Jensen
69
31
0
06 Apr 2020
Emotional speech synthesis with rich and granularized control
Emotional speech synthesis with rich and granularized control
Seyun Um
Sangshin Oh
Kyungguen Byun
Inseon Jang
C. Ahn
Hong-Goo Kang
54
90
0
05 Nov 2019
Mellotron: Multispeaker expressive voice synthesis by conditioning on
  rhythm, pitch and global style tokens
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens
Rafael Valle
Jason Chun Lok Li
R. Prenger
Bryan Catanzaro
72
149
0
26 Oct 2019
Lipper: Synthesizing Thy Speech using Multi-View Lipreading
Lipper: Synthesizing Thy Speech using Multi-View Lipreading
Yaman Kumar Singla
Rohit Jain
Khwaja Mohd. Salik
R. Shah
Yifang Yin
Roger Zimmermann
90
41
0
28 Jun 2019
Video-Driven Speech Reconstruction using Generative Adversarial Networks
Video-Driven Speech Reconstruction using Generative Adversarial Networks
Konstantinos Vougioukas
Pingchuan Ma
Stavros Petridis
Maja Pantic
GAN
56
49
0
14 Jun 2019
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence
  Learning
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
Ming-Yu Liu
Kainan Peng
Andrew Gibiansky
Sercan O. Arik
Ajay Kannan
Sharan Narang
Jonathan Raiman
John Miller
69
308
0
20 Oct 2017
Tacotron: Towards End-to-End Speech Synthesis
Tacotron: Towards End-to-End Speech Synthesis
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
...
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
160
1,826
0
29 Mar 2017
Vid2speech: Speech Reconstruction from Silent Video
Vid2speech: Speech Reconstruction from Silent Video
Ariel Ephrat
Shmuel Peleg
90
123
0
02 Jan 2017
Lip Reading Sentences in the Wild
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
261
790
0
16 Nov 2016
LipNet: End-to-End Sentence-level Lipreading
LipNet: End-to-End Sentence-level Lipreading
Yannis Assael
Brendan Shillingford
Shimon Whiteson
Nando de Freitas
82
397
0
05 Nov 2016
Listen, Attend and Spell
Listen, Attend and Spell
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
RALM
156
2,266
0
05 Aug 2015
FaceNet: A Unified Embedding for Face Recognition and Clustering
FaceNet: A Unified Embedding for Face Recognition and Clustering
Florian Schroff
Dmitry Kalenichenko
James Philbin
3DH
382
13,145
0
12 Mar 2015
1