Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.10272
Cited By
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
16 May 2024
Youngjoon Jang
Ji-Hoon Kim
Junseok Ahn
Doyeop Kwak
Hong-Sun Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
CVBM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Faces that Speak: Jointly Synthesising Talking Face and Speech from Text"
42 / 42 papers shown
Title
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Gaoxiang Cong
Jiadong Pan
Liang-Sheng Li
Yuankai Qi
Yuxin Peng
Anton Van Den Hengel
Jian Yang
Qingming Huang
139
6
0
12 Dec 2024
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis
Zhe Ye
Ziyue Jiang
Yi Ren
Jinglin Liu
Chen Zhang
Xiang Yin
Zejun Ma
Zhou Zhao
72
5
0
06 Jun 2023
UniFLG: Unified Facial Landmark Generator from Text or Speech
Kentaro Mitsui
Yukiya Hono
Kei Sawada
CVBM
30
7
0
28 Feb 2023
StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation
Dong Min
Min-Hwan Song
Eunji Ko
Sung Ju Hwang
VGen
72
12
0
23 Aug 2022
Residual-guided Personalized Speech Synthesis based on Face Image
Jianrong Wang
Zixuan Wang
Xiaosheng Hu
Xuewei Li
Qiang Fang
Li Liu
CVBM
37
17
0
01 Apr 2022
Generative Adversarial Networks
Gilad Cohen
Raja Giryes
GAN
249
30,108
0
01 Mar 2022
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion
Suzhe Wang
Lincheng Li
Yu-qiong Ding
Changjie Fan
Xin Yu
VGen
82
163
0
20 Jul 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
96
174
0
06 Jun 2021
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Vadim Popov
Ivan Vovk
Vladimir Gogoryan
Tasnima Sadekova
Mikhail Kudinov
DiffM
92
532
0
13 May 2021
Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary
Sibo Zhang
Jiahong Yuan
Miao Liao
Liangjun Zhang
50
34
0
29 Apr 2021
Motion Representations for Articulated Animation
Aliaksandr Siarohin
Oliver J. Woodford
Jian Ren
Menglei Chai
Sergey Tulyakov
OCL
141
269
0
22 Apr 2021
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation
Hang Zhou
Yasheng Sun
Wayne Wu
Chen Change Loy
Xiaogang Wang
Ziwei Liu
CVBM
101
366
0
22 Apr 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLM
DiffM
75
192
0
01 Mar 2021
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing
Ting-Chun Wang
Arun Mallya
Xuan Li
3DH
92
481
0
30 Nov 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
166
1,931
0
12 Oct 2020
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
EGVM
96
777
0
23 Aug 2020
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
Elad Richardson
Yuval Alaluf
Or Patashnik
Yotam Nitzan
Yaniv Azar
Stav Shapiro
Daniel Cohen-Or
122
1,108
0
03 Aug 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
81
491
0
22 May 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
212
3,130
0
16 May 2020
MakeItTalk: Speaker-Aware Talking-Head Animation
Yang Zhou
Xintong Han
Eli Shechtman
J. Echevarria
E. Kalogerakis
Dingzeyu Li
63
421
0
27 Apr 2020
Neural Head Reenactment with Latent Pose Descriptors
Egor Burkov
I. Pasechnik
Artur Grigorev
Victor Lempitsky
3DH
85
130
0
24 Apr 2020
CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition
Yanhua Huang
Yuhan Wang
Ying Tai
Xiaoming Liu
Pengcheng Shen
Shaoxin Li
Jilin Li
Feiyue Huang
CVBM
53
505
0
01 Apr 2020
Towards Automatic Face-to-Face Translation
Prajwal K R
Rudrabha Mukhopadhyay
Jerin Philip
Abhishek Jha
Vinay P. Namboodiri
C. V. Jawahar
CVBM
89
174
0
01 Mar 2020
First Order Motion Model for Image Animation
Aliaksandr Siarohin
Stéphane Lathuilière
Sergey Tulyakov
Elisa Ricci
N. Sebe
VGen
DiffM
77
925
0
29 Feb 2020
Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose
Ran Yi
Zipeng Ye
Juyong Zhang
Hujun Bao
Yong Liu
CVBM
59
123
0
24 Feb 2020
Everybody's Talkin': Let Me Talk as You Want
Linsen Song
Wayne Wu
Chao Qian
Ran He
Chen Change Loy
DiffM
VGen
73
145
0
15 Jan 2020
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
61
159
0
14 Jan 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
408
42,393
0
03 Dec 2019
Image2StyleGAN++: How to Edit the Embedded Images?
Rameen Abdal
Yipeng Qin
Peter Wonka
77
555
0
26 Nov 2019
Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss
Lele Chen
R. Maddox
Z. Duan
Chenliang Xu
CVBM
68
398
0
09 May 2019
Disentangled Representation Learning for 3D Face Shape
Zi-Hang Jiang
Qianyi Wu
Keyu Chen
Juyong Zhang
DRL
3DH
CoGe
CVBM
42
109
0
26 Feb 2019
Deep Audio-Visual Speech Recognition
Triantafyllos Afouras
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
69
703
0
06 Sep 2018
LRS3-TED: a large-scale dataset for visual speech recognition
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
62
441
0
03 Sep 2018
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation
Hang Zhou
Yu Liu
Ziwei Liu
Ping Luo
Xiaogang Wang
CVBM
87
441
0
20 Jul 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Zhiwen Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
251
830
0
12 Jun 2018
Talking Face Generation by Conditional Recurrent Adversarial Network
Yang Song
Jingwen Zhu
Dawei Li
Xiaolong Wang
Hairong Qi
CVBM
120
194
0
13 Apr 2018
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Richard Y. Zhang
Phillip Isola
Alexei A. Efros
Eli Shechtman
Oliver Wang
EGVM
334
11,784
0
11 Jan 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
...
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
77
2,697
0
16 Dec 2017
ObamaNet: Photo-realistic lip-sync from text
Rithesh Kumar
Jose M. R. Sotelo
Kundan Kumar
A. D. Brébisson
Yoshua Bengio
49
120
0
06 Dec 2017
You said that?
Joon Son Chung
A. Jamaludin
Andrew Zisserman
CVBM
72
259
0
08 May 2017
How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)
Adrian Bulat
Georgios Tzimiropoulos
3DH
CVBM
3DV
106
1,475
0
21 Mar 2017
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
1.6K
100,330
0
04 Sep 2014
1