ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.10272
  4. Cited By
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

16 May 2024
Youngjoon Jang
Ji-Hoon Kim
Junseok Ahn
Doyeop Kwak
Hong-Sun Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
    CVBM
ArXivPDFHTML

Papers citing "Faces that Speak: Jointly Synthesising Talking Face and Speech from Text"

42 / 42 papers shown
Title
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Gaoxiang Cong
Jiadong Pan
Liang-Sheng Li
Yuankai Qi
Yuxin Peng
Anton Van Den Hengel
Jian Yang
Qingming Huang
139
6
0
12 Dec 2024
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis
Zhe Ye
Ziyue Jiang
Yi Ren
Jinglin Liu
Chen Zhang
Xiang Yin
Zejun Ma
Zhou Zhao
72
5
0
06 Jun 2023
UniFLG: Unified Facial Landmark Generator from Text or Speech
UniFLG: Unified Facial Landmark Generator from Text or Speech
Kentaro Mitsui
Yukiya Hono
Kei Sawada
CVBM
30
7
0
28 Feb 2023
StyleTalker: One-shot Style-based Audio-driven Talking Head Video
  Generation
StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation
Dong Min
Min-Hwan Song
Eunji Ko
Sung Ju Hwang
VGen
72
12
0
23 Aug 2022
Residual-guided Personalized Speech Synthesis based on Face Image
Residual-guided Personalized Speech Synthesis based on Face Image
Jianrong Wang
Zixuan Wang
Xiaosheng Hu
Xuewei Li
Qiang Fang
Li Liu
CVBM
37
17
0
01 Apr 2022
Generative Adversarial Networks
Generative Adversarial Networks
Gilad Cohen
Raja Giryes
GAN
249
30,108
0
01 Mar 2022
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural
  Head Motion
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion
Suzhe Wang
Lincheng Li
Yu-qiong Ding
Changjie Fan
Xin Yu
VGen
82
163
0
20 Jul 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
96
174
0
06 Jun 2021
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Vadim Popov
Ivan Vovk
Vladimir Gogoryan
Tasnima Sadekova
Mikhail Kudinov
DiffM
92
532
0
13 May 2021
Text2Video: Text-driven Talking-head Video Synthesis with Personalized
  Phoneme-Pose Dictionary
Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary
Sibo Zhang
Jiahong Yuan
Miao Liao
Liangjun Zhang
50
34
0
29 Apr 2021
Motion Representations for Articulated Animation
Motion Representations for Articulated Animation
Aliaksandr Siarohin
Oliver J. Woodford
Jian Ren
Menglei Chai
Sergey Tulyakov
OCL
141
269
0
22 Apr 2021
Pose-Controllable Talking Face Generation by Implicitly Modularized
  Audio-Visual Representation
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation
Hang Zhou
Yasheng Sun
Wayne Wu
Chen Change Loy
Xiaogang Wang
Ziwei Liu
CVBM
101
366
0
22 Apr 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
AdaSpeech: Adaptive Text to Speech for Custom Voice
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLM
DiffM
75
192
0
01 Mar 2021
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing
Ting-Chun Wang
Arun Mallya
Xuan Li
3DH
92
481
0
30 Nov 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
166
1,931
0
12 Oct 2020
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The
  Wild
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
EGVM
96
777
0
23 Aug 2020
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
Elad Richardson
Yuval Alaluf
Or Patashnik
Yotam Nitzan
Yaniv Azar
Stav Shapiro
Daniel Cohen-Or
122
1,108
0
03 Aug 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment
  Search
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
81
491
0
22 May 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
212
3,130
0
16 May 2020
MakeItTalk: Speaker-Aware Talking-Head Animation
MakeItTalk: Speaker-Aware Talking-Head Animation
Yang Zhou
Xintong Han
Eli Shechtman
J. Echevarria
E. Kalogerakis
Dingzeyu Li
63
421
0
27 Apr 2020
Neural Head Reenactment with Latent Pose Descriptors
Neural Head Reenactment with Latent Pose Descriptors
Egor Burkov
I. Pasechnik
Artur Grigorev
Victor Lempitsky
3DH
85
130
0
24 Apr 2020
CurricularFace: Adaptive Curriculum Learning Loss for Deep Face
  Recognition
CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition
Yanhua Huang
Yuhan Wang
Ying Tai
Xiaoming Liu
Pengcheng Shen
Shaoxin Li
Jilin Li
Feiyue Huang
CVBM
53
505
0
01 Apr 2020
Towards Automatic Face-to-Face Translation
Towards Automatic Face-to-Face Translation
Prajwal K R
Rudrabha Mukhopadhyay
Jerin Philip
Abhishek Jha
Vinay P. Namboodiri
C. V. Jawahar
CVBM
89
174
0
01 Mar 2020
First Order Motion Model for Image Animation
First Order Motion Model for Image Animation
Aliaksandr Siarohin
Stéphane Lathuilière
Sergey Tulyakov
Elisa Ricci
N. Sebe
VGen
DiffM
77
925
0
29 Feb 2020
Audio-driven Talking Face Video Generation with Learning-based
  Personalized Head Pose
Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose
Ran Yi
Zipeng Ye
Juyong Zhang
Hujun Bao
Yong Liu
CVBM
59
123
0
24 Feb 2020
Everybody's Talkin': Let Me Talk as You Want
Everybody's Talkin': Let Me Talk as You Want
Linsen Song
Wayne Wu
Chao Qian
Ran He
Chen Change Loy
DiffM
VGen
73
145
0
15 Jan 2020
Deep Audio-Visual Learning: A Survey
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
61
159
0
14 Jan 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
408
42,393
0
03 Dec 2019
Image2StyleGAN++: How to Edit the Embedded Images?
Image2StyleGAN++: How to Edit the Embedded Images?
Rameen Abdal
Yipeng Qin
Peter Wonka
77
555
0
26 Nov 2019
Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise
  Loss
Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss
Lele Chen
R. Maddox
Z. Duan
Chenliang Xu
CVBM
68
398
0
09 May 2019
Disentangled Representation Learning for 3D Face Shape
Disentangled Representation Learning for 3D Face Shape
Zi-Hang Jiang
Qianyi Wu
Keyu Chen
Juyong Zhang
DRL
3DH
CoGe
CVBM
42
109
0
26 Feb 2019
Deep Audio-Visual Speech Recognition
Deep Audio-Visual Speech Recognition
Triantafyllos Afouras
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
69
703
0
06 Sep 2018
LRS3-TED: a large-scale dataset for visual speech recognition
LRS3-TED: a large-scale dataset for visual speech recognition
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
62
441
0
03 Sep 2018
Talking Face Generation by Adversarially Disentangled Audio-Visual
  Representation
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation
Hang Zhou
Yu Liu
Ziwei Liu
Ping Luo
Xiaogang Wang
CVBM
87
441
0
20 Jul 2018
Transfer Learning from Speaker Verification to Multispeaker
  Text-To-Speech Synthesis
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Zhiwen Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
251
830
0
12 Jun 2018
Talking Face Generation by Conditional Recurrent Adversarial Network
Talking Face Generation by Conditional Recurrent Adversarial Network
Yang Song
Jingwen Zhu
Dawei Li
Xiaolong Wang
Hairong Qi
CVBM
120
194
0
13 Apr 2018
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Richard Y. Zhang
Phillip Isola
Alexei A. Efros
Eli Shechtman
Oliver Wang
EGVM
334
11,784
0
11 Jan 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
  Predictions
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
...
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
77
2,697
0
16 Dec 2017
ObamaNet: Photo-realistic lip-sync from text
ObamaNet: Photo-realistic lip-sync from text
Rithesh Kumar
Jose M. R. Sotelo
Kundan Kumar
A. D. Brébisson
Yoshua Bengio
49
120
0
06 Dec 2017
You said that?
You said that?
Joon Son Chung
A. Jamaludin
Andrew Zisserman
CVBM
72
259
0
08 May 2017
How far are we from solving the 2D & 3D Face Alignment problem? (and a
  dataset of 230,000 3D facial landmarks)
How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)
Adrian Bulat
Georgios Tzimiropoulos
3DH
CVBM
3DV
106
1,475
0
21 Mar 2017
Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
1.6K
100,330
0
04 Sep 2014
1