End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks

27 April 2021

Rodrigo Mira

Konstantinos Vougioukas

Björn W. Schuller

Papers citing "End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks"

25 / 25 papers shown

Title
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation J. Choi Ji-Hoon Kim Kim Sung-Bin Tae-Hyun Oh Joon Son Chung DiffM 49 0 0 29 Apr 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech Ji-Hoon Kim Jeongsoo Choi Jaehun Kim Chaeyoung Jung Joon Son Chung CVBM 56 1 0 21 Mar 2025
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding J. Choi Joanna Hong Y. Ro DiffM 29 19 0 15 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio Triantafyllos Kefalas Yannis Panagakis M. Pantic VGen DiffM 38 1 0 31 Jul 2023
RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations Neha Sahipjohn Neil Shah Vishal Tambrahalli Vineet Gandhi 24 2 0 03 Jul 2023
Large-scale unsupervised audio pre-training for video-to-speech synthesis Triantafyllos Kefalas Yannis Panagakis M. Pantic VGen 40 3 0 27 Jun 2023
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading Yochai Yemini Aviv Shamsian Lior Bracha Sharon Gannot Ethan Fetaya DiffM 33 11 0 05 Jun 2023
Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks L. Tóth Amin Honarmandi Shandiz G. Gosztolya T. Csapó 24 3 0 30 May 2023
Zero-shot personalized lip-to-speech synthesis with face image based voice control Zheng-Yan Sheng Yang Ai Zhenhua Ling CVBM 27 5 0 09 May 2023
On the Audio-visual Synchronization for Lip-to-Speech Synthesis Zhe Niu Brian Mak 22 3 0 01 Mar 2023
Lip-to-Speech Synthesis in the Wild with Multi-task Learning Minsu Kim Joanna Hong Y. Ro 22 21 0 17 Feb 2023
LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders Rodrigo Mira Buye Xu Jacob Donley Anurag Kumar Stavros Petridis V. Ithapu M. Pantic 28 13 0 20 Nov 2022
Learning in Audio-visual Context: A Review, Analysis, and New Perspective Yake Wei Di Hu Yapeng Tian Xuelong Li 46 55 0 20 Aug 2022
Speaker-adaptive Lip Reading with User-dependent Padding Minsu Kim Hyunjun Kim Y. Ro 25 20 0 09 Aug 2022
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis Yongqiang Wang Zhou Zhao 19 10 0 08 Jul 2022
Show Me Your Face, And I'll Tell You How You Speak Christen Millerdurai L. A. Khaliq Timon Ulrich CVBM 68 0 0 28 Jun 2022
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection Joanna Hong Minsu Kim Y. Ro CVBM DiffM 36 8 0 15 Jun 2022
SVTS: Scalable Video-to-Speech Synthesis Rodrigo Mira A. Haliassos Stavros Petridis Björn W. Schuller M. Pantic 22 32 0 04 May 2022
Recent Advances and Challenges in Deep Audio-Visual Correlation Learning Luís Vilacca Yi Yu Paula Viana 38 5 0 28 Feb 2022
Visual Speech Recognition for Multiple Languages in the Wild Pingchuan Ma Stavros Petridis M. Pantic VLM 130 145 0 26 Feb 2022
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion Disong Wang Shan Yang Dan Su Xunying Liu Dong Yu Helen Meng 15 11 0 18 Feb 2022
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech Michael Hassid Michelle Tadmor Ramanovich Brendan Shillingford Miaosen Wang Ye Jia Tal Remez DiffM 22 16 0 19 Nov 2021
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over Junchen Lu Berrak Sisman Rui Liu Mingyang Zhang Haizhou Li DiffM 36 19 0 07 Oct 2021
Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging Csaba Zainkó L. Tóth Amin Honarmandi Shandiz G. Gosztolya Alexandra Markó Géza Németh Tamás Gábor Csapó 39 4 0 26 Jul 2021
Multi-task self-supervised learning for Robust Speech Recognition Mirco Ravanelli Jianyuan Zhong Santiago Pascual P. Swietojanski João Monteiro J. Trmal Yoshua Bengio SSL 189 288 0 25 Jan 2020