FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis

8 July 2022

Zhou Zhao

Papers citing "FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis"

31 / 31 papers shown

Title
Lip to Speech Synthesis with Visual Context Attentional GAN Minsu Kim Joanna Hong Y. Ro 80 54 0 04 Apr 2022
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video Dmitriy Serdyuk Otavio Braga Olivier Siohan ViT 103 41 0 25 Jan 2022
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 45 355 0 29 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim Jungil Kong Juhee Son DRL 103 866 0 11 Jun 2021
End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks Rodrigo Mira Konstantinos Vougioukas Pingchuan Ma Stavros Petridis Björn W. Schuller Maja Pantic 58 46 0 27 Apr 2021
CvT: Introducing Convolutions to Vision Transformers Haiping Wu Bin Xiao Noel Codella Mengchen Liu Xiyang Dai Lu Yuan Lei Zhang ViT 114 1,891 0 29 Mar 2021
ViViT: A Video Vision Transformer Anurag Arnab Mostafa Dehghani G. Heigold Chen Sun Mario Lucic Cordelia Schmid ViT 149 2,119 0 29 Mar 2021
An Image is Worth 16x16 Words, What is a Video Worth? Gilad Sharir Asaf Noy Lihi Zelnik-Manor ViT 55 124 0 25 Mar 2021
Incorporating Convolution Designs into Visual Transformers Kun Yuan Shaopeng Guo Ziwei Liu Aojun Zhou F. Yu Wei Wu ViT 86 472 0 22 Mar 2021
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 327 2,016 0 09 Feb 2021
Video Transformer Network Daniel Neimark Omri Bar Maya Zohar Dotan Asselmann ViT 249 430 0 01 Feb 2021
Speech Prediction in Silent Videos using Variational Autoencoders Ravindra Yadav Ashish Sardana Vinay P. Namboodiri R. Hegde VGen DRL 33 23 0 14 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis Ron J. Weiss RJ Skerry-Ryan Eric Battenberg Soroosh Mariooryad Diederik P. Kingma 43 99 0 06 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 400 40,217 0 22 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong Jaehyeon Kim Jaekyoung Bae 108 1,918 0 12 Oct 2020
Rethinking Attention with Performers K. Choromanski Valerii Likhosherstov David Dohan Xingyou Song Andreea Gane ... Afroz Mohiuddin Lukasz Kaiser David Belanger Lucy J. Colwell Adrian Weller 144 1,548 0 30 Sep 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech Yi Ren Chenxu Hu Xu Tan Tao Qin Sheng Zhao Zhou Zhao Tie-Yan Liu 90 1,382 0 08 Jun 2020
End-to-End Adversarial Text-to-Speech Jeff Donahue Sander Dieleman Mikolaj Binkowski Erich Elsen Karen Simonyan 51 186 0 05 Jun 2020
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis Prajwal K R Rudrabha Mukhopadhyay Vinay P. Namboodiri C. V. Jawahar 49 113 0 17 May 2020
Vocoder-Based Speech Synthesis from Silent Videos Daniel Michelsanti Olga Slizovskaia G. Haro Emilia Gómez Zheng-Hua Tan Jesper Jensen 53 31 0 06 Apr 2020
Video-Driven Speech Reconstruction using Generative Adversarial Networks Konstantinos Vougioukas Pingchuan Ma Stavros Petridis Maja Pantic GAN 49 49 0 14 Jun 2019
Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input Junliang Guo Xu Tan Di He Tao Qin Linli Xu Tie-Yan Liu 35 125 0 23 Dec 2018
Glow: Generative Flow with Invertible 1x1 Convolutions Diederik P. Kingma Prafulla Dhariwal BDL DRL 212 3,110 0 09 Jul 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions Jonathan Shen Ruoming Pang Ron J. Weiss M. Schuster Navdeep Jaitly ... Yuxuan Wang RJ Skerry-Ryan Rif A. Saurous Yannis Agiomyrgiannakis Yonghui Wu 68 2,684 0 16 Dec 2017
Non-Autoregressive Neural Machine Translation Jiatao Gu James Bradbury Caiming Xiong Victor O.K. Li R. Socher 84 793 0 07 Nov 2017
Lip2AudSpec: Speech reconstruction from silent lip movements video Hassan Akbari Himani Arora Liangliang Cao N. Mesgarani 41 87 0 26 Oct 2017
Improved Speech Reconstruction from Silent Video Ariel Ephrat Tavi Halperin Shmuel Peleg 54 89 0 01 Aug 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 453 129,831 0 12 Jun 2017
Vid2speech: Speech Reconstruction from Silent Video Ariel Ephrat Shmuel Peleg 67 123 0 02 Jan 2017
Sequence to Sequence Learning with Neural Networks Ilya Sutskever Oriol Vinyals Quoc V. Le AIMat 287 20,491 0 10 Sep 2014
On the Properties of Neural Machine Translation: Encoder-Decoder Approaches Kyunghyun Cho B. V. Merrienboer Dzmitry Bahdanau Yoshua Bengio AI4CE AIMat 162 6,760 0 03 Sep 2014