ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.13332
  4. Cited By
End-to-End Video-To-Speech Synthesis using Generative Adversarial
  Networks

End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks

27 April 2021
Rodrigo Mira
Konstantinos Vougioukas
Pingchuan Ma
Stavros Petridis
Björn W. Schuller
M. Pantic
ArXivPDFHTML

Papers citing "End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks"

25 / 25 papers shown
Title
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
J. Choi
Ji-Hoon Kim
Kim Sung-Bin
Tae-Hyun Oh
Joon Son Chung
DiffM
49
0
0
29 Apr 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Ji-Hoon Kim
Jeongsoo Choi
Jaehun Kim
Chaeyoung Jung
Joon Son Chung
CVBM
56
1
0
21 Mar 2025
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided
  Speaker Embedding
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
J. Choi
Joanna Hong
Y. Ro
DiffM
29
19
0
15 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
M. Pantic
VGen
DiffM
38
1
0
31 Jul 2023
RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting
  Self-Supervised Representations
RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations
Neha Sahipjohn
Neil Shah
Vishal Tambrahalli
Vineet Gandhi
24
2
0
03 Jul 2023
Large-scale unsupervised audio pre-training for video-to-speech
  synthesis
Large-scale unsupervised audio pre-training for video-to-speech synthesis
Triantafyllos Kefalas
Yannis Panagakis
M. Pantic
VGen
40
3
0
27 Jun 2023
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Yochai Yemini
Aviv Shamsian
Lior Bracha
Sharon Gannot
Ethan Fetaya
DiffM
33
11
0
05 Jun 2023
Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using
  Spatial Transformer Networks
Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks
L. Tóth
Amin Honarmandi Shandiz
G. Gosztolya
T. Csapó
24
3
0
30 May 2023
Zero-shot personalized lip-to-speech synthesis with face image based
  voice control
Zero-shot personalized lip-to-speech synthesis with face image based voice control
Zheng-Yan Sheng
Yang Ai
Zhenhua Ling
CVBM
27
5
0
09 May 2023
On the Audio-visual Synchronization for Lip-to-Speech Synthesis
On the Audio-visual Synchronization for Lip-to-Speech Synthesis
Zhe Niu
Brian Mak
22
3
0
01 Mar 2023
Lip-to-Speech Synthesis in the Wild with Multi-task Learning
Lip-to-Speech Synthesis in the Wild with Multi-task Learning
Minsu Kim
Joanna Hong
Y. Ro
22
21
0
17 Feb 2023
LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders
LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders
Rodrigo Mira
Buye Xu
Jacob Donley
Anurag Kumar
Stavros Petridis
V. Ithapu
M. Pantic
28
13
0
20 Nov 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
46
55
0
20 Aug 2022
Speaker-adaptive Lip Reading with User-dependent Padding
Speaker-adaptive Lip Reading with User-dependent Padding
Minsu Kim
Hyunjun Kim
Y. Ro
25
20
0
09 Aug 2022
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech
  Synthesis
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis
Yongqiang Wang
Zhou Zhao
19
10
0
08 Jul 2022
Show Me Your Face, And I'll Tell You How You Speak
Show Me Your Face, And I'll Tell You How You Speak
Christen Millerdurai
L. A. Khaliq
Timon Ulrich
CVBM
68
0
0
28 Jun 2022
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via
  Speech-Visage Feature Selection
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection
Joanna Hong
Minsu Kim
Y. Ro
CVBM
DiffM
36
8
0
15 Jun 2022
SVTS: Scalable Video-to-Speech Synthesis
SVTS: Scalable Video-to-Speech Synthesis
Rodrigo Mira
A. Haliassos
Stavros Petridis
Björn W. Schuller
M. Pantic
22
32
0
04 May 2022
Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luís Vilacca
Yi Yu
Paula Viana
38
5
0
28 Feb 2022
Visual Speech Recognition for Multiple Languages in the Wild
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
M. Pantic
VLM
130
145
0
26 Feb 2022
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge
  transfer from voice conversion
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion
Disong Wang
Shan Yang
Dan Su
Xunying Liu
Dong Yu
Helen Meng
15
11
0
18 Feb 2022
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Michael Hassid
Michelle Tadmor Ramanovich
Brendan Shillingford
Miaosen Wang
Ye Jia
Tal Remez
DiffM
22
16
0
19 Nov 2021
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic
  Voice Over
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over
Junchen Lu
Berrak Sisman
Rui Liu
Mingyang Zhang
Haizhou Li
DiffM
36
19
0
07 Oct 2021
Adaptation of Tacotron2-based Text-To-Speech for
  Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging
Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging
Csaba Zainkó
L. Tóth
Amin Honarmandi Shandiz
G. Gosztolya
Alexandra Markó
Géza Németh
Tamás Gábor Csapó
39
4
0
26 Jul 2021
Multi-task self-supervised learning for Robust Speech Recognition
Multi-task self-supervised learning for Robust Speech Recognition
Mirco Ravanelli
Jianyuan Zhong
Santiago Pascual
P. Swietojanski
João Monteiro
J. Trmal
Yoshua Bengio
SSL
189
288
0
25 Jan 2020
1