Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis

31 March 2022

ArXiv (abs)PDF HTML Github (107★)

Papers citing "Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis"

42 / 42 papers shown

Title
SoundStream: An End-to-End Neural Audio Codec Neil Zeghidour Alejandro Luebs Ahmed Omran Jan Skoglund Marco Tagliasacchi AI4TS 110 791 0 07 Jul 2021
MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement Alexander Richard Michael Zollhoefer Yandong Wen Fernando de la Torre Yaser Sheikh CVBM 66 200 0 16 Apr 2021
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Adam Polyak Yossi Adi Jade Copet Eugene Kharitonov Kushal Lakhotia Wei-Ning Hsu Abdel-rahman Mohamed Emmanuel Dupoux 80 318 0 01 Apr 2021
Generative Speech Coding with Predictive Variance Regularization W. Kleijn Andrew Storus Michael Chinen Tom Denton Felicia S. C. Lim Alejandro Luebs Jan Skoglund Hengchin Yeh 45 68 0 18 Feb 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency Ruohan Gao Kristen Grauman CVBM 224 202 0 08 Jan 2021
Taming Transformers for High-Resolution Image Synthesis Patrick Esser Robin Rombach Bjorn Ommer ViT 129 2,962 0 17 Dec 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong Jaehyeon Kim Jaekyoung Bae 177 1,936 0 12 Oct 2020
Real Time Speech Enhancement in the Waveform Domain Alexandre Défossez Gabriel Synnaeve Yossi Adi 76 462 0 23 Jun 2020
HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks Jiaqi Su Zeyu Jin Adam Finkelstein 67 139 0 10 Jun 2020
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis Prajwal K R Rudrabha Mukhopadhyay Vinay P. Namboodiri C. V. Jawahar 63 113 0 17 May 2020
FaceFilter: Audio-visual speech separation using still images Soo-Whan Chung Soyeon Choe Joon Son Chung Hong-Goo Kang CVBM 109 66 0 14 May 2020
Music Gesture for Visual Sound Separation Chuang Gan Deng Huang Hang Zhao J. Tenenbaum Antonio Torralba 88 204 0 20 Apr 2020
Voice Separation with an Unknown Number of Multiple Speakers Eliya Nachmani Yossi Adi Lior Wolf 61 175 0 29 Feb 2020
Low Bit-Rate Speech Coding with VQ-VAE and a WaveNet Decoder Cristina Garbacea Aaron van den Oord Yazhe Li Felicia S. C. Lim Alejandro Luebs Oriol Vinyals Thomas C. Walters 60 121 0 14 Oct 2019
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis Kundan Kumar Rithesh Kumar T. Boissière L. Gestin Wei Zhen Teoh Jose M. R. Sotelo A. D. Brébisson Yoshua Bengio Aaron Courville GAN 159 953 0 08 Oct 2019
Recursive Visual Sound Separation Using Minus-Plus Net Xudong Xu Bo Dai Dahua Lin 70 91 0 30 Aug 2019
My lips are concealed: Audio-visual speech enhancement through obstructions Triantafyllos Afouras Joon Son Chung Andrew Zisserman 65 91 0 11 Jul 2019
Lipper: Synthesizing Thy Speech using Multi-View Lipreading Yaman Kumar Singla Rohit Jain Khwaja Mohd. Salik R. Shah Yifang Yin Roger Zimmermann 88 41 0 28 Jun 2019
AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss Kaizhi Qian Yang Zhang Shiyu Chang Xuesong Yang M. Hasegawa-Johnson 81 465 0 14 May 2019
The Sound of Motions Hang Zhao Chuang Gan Wei-Chiu Ma Antonio Torralba 83 254 0 11 Apr 2019
WaveGlow: A Flow-based Generative Network for Speech Synthesis R. Prenger Rafael Valle Bryan Catanzaro 151 1,032 0 31 Oct 2018
Sample Efficient Adaptive Text-to-Speech Yutian Chen Yannis Assael Brendan Shillingford David Budden Scott E. Reed ... Ben Laurie Çağlar Gülçehre Aaron van den Oord Oriol Vinyals Nando de Freitas 79 149 0 27 Sep 2018
Deep Appearance Models for Face Rendering Stephen Lombardi Jason M. Saragih Tomas Simon Yaser Sheikh CVBM 3DH 67 282 0 01 Aug 2018
VoxCeleb2: Deep Speaker Recognition Joon Son Chung Arsha Nagrani Andrew Zisserman 353 2,279 0 14 Jun 2018
Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation Daniel Stoller Sebastian Ewert S. Dixon AI4TS 132 595 0 08 Jun 2018
The Conversation: Deep Audio-Visual Speech Enhancement Triantafyllos Afouras Joon Son Chung Andrew Zisserman 79 360 0 11 Apr 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features Andrew Owens Alexei A. Efros SSL 98 752 0 10 Apr 2018
The Sound of Pixels Hang Zhao Chuang Gan Andrew Rouditchenko Carl Vondrick Josh H. McDermott Antonio Torralba VLM 102 536 0 09 Apr 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions Jonathan Shen Ruoming Pang Ron J. Weiss M. Schuster Navdeep Jaitly ... Yuxuan Wang RJ Skerry-Ryan Rif A. Saurous Yannis Agiomyrgiannakis Yonghui Wu 79 2,698 0 16 Dec 2017
Neural Discrete Representation Learning Aaron van den Oord Oriol Vinyals Koray Kavukcuoglu BDL SSL OCL 226 5,019 0 02 Nov 2017
End-to-End Optimized Speech Coding with Deep Neural Networks Srihari Kankanahalli MQ 51 68 0 25 Oct 2017
S $^3$ FD: Single Shot Scale-invariant Face Detector Shifeng Zhang Xiangyu Zhu Zhen Lei Hailin Shi Xiaobo Wang Stan Z. Li CVBM 74 605 0 17 Aug 2017
Improved Speech Reconstruction from Silent Video Ariel Ephrat Tavi Halperin Shmuel Peleg 71 89 0 01 Aug 2017
Tacotron: Towards End-to-End Speech Synthesis Yuxuan Wang RJ Skerry-Ryan Daisy Stanton Yonghui Wu Ron J. Weiss ... Samy Bengio Quoc V. Le Yannis Agiomyrgiannakis R. Clark Rif A. Saurous 160 1,825 0 29 Mar 2017
SEGAN: Speech Enhancement Generative Adversarial Network Santiago Pascual Antonio Bonafonte Joan Serrà GAN 78 1,146 0 28 Mar 2017
Vid2speech: Speech Reconstruction from Silent Video Ariel Ephrat Shmuel Peleg 90 123 0 02 Jan 2017
Categorical Reparameterization with Gumbel-Softmax Eric Jang S. Gu Ben Poole BDL 334 5,364 0 03 Nov 2016
WaveNet: A Generative Model for Raw Audio Aaron van den Oord Sander Dieleman Heiga Zen Karen Simonyan Oriol Vinyals Alex Graves Nal Kalchbrenner A. Senior Koray Kavukcuoglu DiffM 406 7,399 0 12 Sep 2016
Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation Dong Yu Morten Kolbæk Zheng-Hua Tan Jesper Jensen 98 856 0 01 Jul 2016
Speech Enhancement In Multiple-Noise Conditions using Deep Neural Networks Anurag Kumar D. Florêncio 56 122 0 09 May 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.2K 194,020 0 10 Dec 2015
Under-determined reverberant audio source separation using a full-rank spatial covariance model Ngoc Q. K. Duong Emmanuel Vincent Remi Gribonval 114 453 0 01 Dec 2009