ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.03800
  4. Cited By
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech
  Synthesis

FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis

8 July 2022
Yongqiang Wang
Zhou Zhao
ArXivPDFHTML

Papers citing "FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis"

31 / 31 papers shown
Title
Lip to Speech Synthesis with Visual Context Attentional GAN
Lip to Speech Synthesis with Visual Context Attentional GAN
Minsu Kim
Joanna Hong
Y. Ro
80
54
0
04 Apr 2022
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
  for Single and Multi-Person Video
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
103
41
0
25 Jan 2022
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
43
355
0
29 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for
  End-to-End Text-to-Speech
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
101
866
0
11 Jun 2021
End-to-End Video-To-Speech Synthesis using Generative Adversarial
  Networks
End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks
Rodrigo Mira
Konstantinos Vougioukas
Pingchuan Ma
Stavros Petridis
Björn W. Schuller
Maja Pantic
58
46
0
27 Apr 2021
CvT: Introducing Convolutions to Vision Transformers
CvT: Introducing Convolutions to Vision Transformers
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
111
1,891
0
29 Mar 2021
ViViT: A Video Vision Transformer
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
137
2,119
0
29 Mar 2021
An Image is Worth 16x16 Words, What is a Video Worth?
An Image is Worth 16x16 Words, What is a Video Worth?
Gilad Sharir
Asaf Noy
Lihi Zelnik-Manor
ViT
55
124
0
25 Mar 2021
Incorporating Convolution Designs into Visual Transformers
Incorporating Convolution Designs into Visual Transformers
Kun Yuan
Shaopeng Guo
Ziwei Liu
Aojun Zhou
F. Yu
Wei Wu
ViT
83
472
0
22 Mar 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
324
2,016
0
09 Feb 2021
Video Transformer Network
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
244
430
0
01 Feb 2021
Speech Prediction in Silent Videos using Variational Autoencoders
Speech Prediction in Silent Videos using Variational Autoencoders
Ravindra Yadav
Ashish Sardana
Vinay P. Namboodiri
R. Hegde
VGen
DRL
33
23
0
14 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
43
99
0
06 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
368
40,217
0
22 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
103
1,918
0
12 Oct 2020
Rethinking Attention with Performers
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
144
1,548
0
30 Sep 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
88
1,382
0
08 Jun 2020
End-to-End Adversarial Text-to-Speech
End-to-End Adversarial Text-to-Speech
Jeff Donahue
Sander Dieleman
Mikolaj Binkowski
Erich Elsen
Karen Simonyan
48
186
0
05 Jun 2020
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
46
113
0
17 May 2020
Vocoder-Based Speech Synthesis from Silent Videos
Vocoder-Based Speech Synthesis from Silent Videos
Daniel Michelsanti
Olga Slizovskaia
G. Haro
Emilia Gómez
Zheng-Hua Tan
Jesper Jensen
50
31
0
06 Apr 2020
Video-Driven Speech Reconstruction using Generative Adversarial Networks
Video-Driven Speech Reconstruction using Generative Adversarial Networks
Konstantinos Vougioukas
Pingchuan Ma
Stavros Petridis
Maja Pantic
GAN
49
49
0
14 Jun 2019
Non-Autoregressive Neural Machine Translation with Enhanced Decoder
  Input
Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input
Junliang Guo
Xu Tan
Di He
Tao Qin
Linli Xu
Tie-Yan Liu
33
125
0
23 Dec 2018
Glow: Generative Flow with Invertible 1x1 Convolutions
Glow: Generative Flow with Invertible 1x1 Convolutions
Diederik P. Kingma
Prafulla Dhariwal
BDL
DRL
210
3,110
0
09 Jul 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
  Predictions
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
...
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
66
2,684
0
16 Dec 2017
Non-Autoregressive Neural Machine Translation
Non-Autoregressive Neural Machine Translation
Jiatao Gu
James Bradbury
Caiming Xiong
Victor O.K. Li
R. Socher
84
793
0
07 Nov 2017
Lip2AudSpec: Speech reconstruction from silent lip movements video
Lip2AudSpec: Speech reconstruction from silent lip movements video
Hassan Akbari
Himani Arora
Liangliang Cao
N. Mesgarani
41
87
0
26 Oct 2017
Improved Speech Reconstruction from Silent Video
Improved Speech Reconstruction from Silent Video
Ariel Ephrat
Tavi Halperin
Shmuel Peleg
51
89
0
01 Aug 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
443
129,831
0
12 Jun 2017
Vid2speech: Speech Reconstruction from Silent Video
Vid2speech: Speech Reconstruction from Silent Video
Ariel Ephrat
Shmuel Peleg
67
123
0
02 Jan 2017
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
Ilya Sutskever
Oriol Vinyals
Quoc V. Le
AIMat
280
20,491
0
10 Sep 2014
On the Properties of Neural Machine Translation: Encoder-Decoder
  Approaches
On the Properties of Neural Machine Translation: Encoder-Decoder Approaches
Kyunghyun Cho
B. V. Merrienboer
Dzmitry Bahdanau
Yoshua Bengio
AI4CE
AIMat
157
6,760
0
03 Sep 2014
1