Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.17263
Cited By
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
31 March 2022
Karren D. Yang
Dejan Marković
Steven Krenn
Vasu Agrawal
Alexander Richard
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
Github (107★)
Papers citing
"Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis"
42 / 42 papers shown
Title
SoundStream: An End-to-End Neural Audio Codec
Neil Zeghidour
Alejandro Luebs
Ahmed Omran
Jan Skoglund
Marco Tagliasacchi
AI4TS
110
791
0
07 Jul 2021
MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement
Alexander Richard
Michael Zollhoefer
Yandong Wen
Fernando de la Torre
Yaser Sheikh
CVBM
66
200
0
16 Apr 2021
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
Adam Polyak
Yossi Adi
Jade Copet
Eugene Kharitonov
Kushal Lakhotia
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
80
318
0
01 Apr 2021
Generative Speech Coding with Predictive Variance Regularization
W. Kleijn
Andrew Storus
Michael Chinen
Tom Denton
Felicia S. C. Lim
Alejandro Luebs
Jan Skoglund
Hengchin Yeh
45
68
0
18 Feb 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
224
202
0
08 Jan 2021
Taming Transformers for High-Resolution Image Synthesis
Patrick Esser
Robin Rombach
Bjorn Ommer
ViT
129
2,962
0
17 Dec 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
177
1,936
0
12 Oct 2020
Real Time Speech Enhancement in the Waveform Domain
Alexandre Défossez
Gabriel Synnaeve
Yossi Adi
76
462
0
23 Jun 2020
HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Jiaqi Su
Zeyu Jin
Adam Finkelstein
67
139
0
10 Jun 2020
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
63
113
0
17 May 2020
FaceFilter: Audio-visual speech separation using still images
Soo-Whan Chung
Soyeon Choe
Joon Son Chung
Hong-Goo Kang
CVBM
109
66
0
14 May 2020
Music Gesture for Visual Sound Separation
Chuang Gan
Deng Huang
Hang Zhao
J. Tenenbaum
Antonio Torralba
88
204
0
20 Apr 2020
Voice Separation with an Unknown Number of Multiple Speakers
Eliya Nachmani
Yossi Adi
Lior Wolf
61
175
0
29 Feb 2020
Low Bit-Rate Speech Coding with VQ-VAE and a WaveNet Decoder
Cristina Garbacea
Aaron van den Oord
Yazhe Li
Felicia S. C. Lim
Alejandro Luebs
Oriol Vinyals
Thomas C. Walters
60
121
0
14 Oct 2019
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
Kundan Kumar
Rithesh Kumar
T. Boissière
L. Gestin
Wei Zhen Teoh
Jose M. R. Sotelo
A. D. Brébisson
Yoshua Bengio
Aaron Courville
GAN
159
953
0
08 Oct 2019
Recursive Visual Sound Separation Using Minus-Plus Net
Xudong Xu
Bo Dai
Dahua Lin
70
91
0
30 Aug 2019
My lips are concealed: Audio-visual speech enhancement through obstructions
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
65
91
0
11 Jul 2019
Lipper: Synthesizing Thy Speech using Multi-View Lipreading
Yaman Kumar Singla
Rohit Jain
Khwaja Mohd. Salik
R. Shah
Yifang Yin
Roger Zimmermann
88
41
0
28 Jun 2019
AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
Kaizhi Qian
Yang Zhang
Shiyu Chang
Xuesong Yang
M. Hasegawa-Johnson
81
465
0
14 May 2019
The Sound of Motions
Hang Zhao
Chuang Gan
Wei-Chiu Ma
Antonio Torralba
83
254
0
11 Apr 2019
WaveGlow: A Flow-based Generative Network for Speech Synthesis
R. Prenger
Rafael Valle
Bryan Catanzaro
151
1,032
0
31 Oct 2018
Sample Efficient Adaptive Text-to-Speech
Yutian Chen
Yannis Assael
Brendan Shillingford
David Budden
Scott E. Reed
...
Ben Laurie
Çağlar Gülçehre
Aaron van den Oord
Oriol Vinyals
Nando de Freitas
79
149
0
27 Sep 2018
Deep Appearance Models for Face Rendering
Stephen Lombardi
Jason M. Saragih
Tomas Simon
Yaser Sheikh
CVBM
3DH
67
282
0
01 Aug 2018
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
353
2,279
0
14 Jun 2018
Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation
Daniel Stoller
Sebastian Ewert
S. Dixon
AI4TS
132
595
0
08 Jun 2018
The Conversation: Deep Audio-Visual Speech Enhancement
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
79
360
0
11 Apr 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
98
752
0
10 Apr 2018
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
102
536
0
09 Apr 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
...
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
79
2,698
0
16 Dec 2017
Neural Discrete Representation Learning
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDL
SSL
OCL
226
5,019
0
02 Nov 2017
End-to-End Optimized Speech Coding with Deep Neural Networks
Srihari Kankanahalli
MQ
51
68
0
25 Oct 2017
S
3
^3
3
FD: Single Shot Scale-invariant Face Detector
Shifeng Zhang
Xiangyu Zhu
Zhen Lei
Hailin Shi
Xiaobo Wang
Stan Z. Li
CVBM
74
605
0
17 Aug 2017
Improved Speech Reconstruction from Silent Video
Ariel Ephrat
Tavi Halperin
Shmuel Peleg
71
89
0
01 Aug 2017
Tacotron: Towards End-to-End Speech Synthesis
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
...
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
160
1,825
0
29 Mar 2017
SEGAN: Speech Enhancement Generative Adversarial Network
Santiago Pascual
Antonio Bonafonte
Joan Serrà
GAN
78
1,146
0
28 Mar 2017
Vid2speech: Speech Reconstruction from Silent Video
Ariel Ephrat
Shmuel Peleg
90
123
0
02 Jan 2017
Categorical Reparameterization with Gumbel-Softmax
Eric Jang
S. Gu
Ben Poole
BDL
334
5,364
0
03 Nov 2016
WaveNet: A Generative Model for Raw Audio
Aaron van den Oord
Sander Dieleman
Heiga Zen
Karen Simonyan
Oriol Vinyals
Alex Graves
Nal Kalchbrenner
A. Senior
Koray Kavukcuoglu
DiffM
406
7,399
0
12 Sep 2016
Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation
Dong Yu
Morten Kolbæk
Zheng-Hua Tan
Jesper Jensen
98
856
0
01 Jul 2016
Speech Enhancement In Multiple-Noise Conditions using Deep Neural Networks
Anurag Kumar
D. Florêncio
56
122
0
09 May 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,020
0
10 Dec 2015
Under-determined reverberant audio source separation using a full-rank spatial covariance model
Ngoc Q. K. Duong
Emmanuel Vincent
Remi Gribonval
114
453
0
01 Dec 2009
1