AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

10 January 2025

Xavier Alameda-Pineda

ArXiv (abs)PDF HTML

Papers citing "AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder"

26 / 26 papers shown

Title
U-DREAM: Unsupervised Dereverberation guided by a Reverberation Model Louis Bahrman Mathieu Fontaine Gaël Richard 81 0 0 17 Jul 2025
RPRA-ADD: Forgery Trace Enhancement-Driven Audio Deepfake Detection Ruibo Fu Xiaopeng Wang Zhengqi Wen Jianhua Tao Yuankun Xie ... Chunyu Qiang Zhengqi Wen Cunhang Fan Chenxing Li Guanjun Li 178 1 0 31 May 2025
A vector quantized masked autoencoder for audiovisual speech emotion recognitionComputer Vision and Image Understanding (CVIU), 2023 Samir Sadok Simon Leglaive Renaud Séguier SSL 363 10 0 05 May 2023
High Fidelity Neural Audio Compression Alexandre Défossez Jade Copet Gabriel Synnaeve Yossi Adi 196 948 0 24 Oct 2022
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimationAutomatic Speech Recognition & Understanding (ASRU), 2022 Marvin Lavechin Marianne Métais Hadrien Titeux Alodie Boissonnet Jade Copet M. Rivière Elika Bergelson Alejandrina Cristià Emmanuel Dupoux H. Bredin 251 39 0 24 Oct 2022
Speech Quality Assessment through MOS using Non-Matching ReferencesInterspeech (Interspeech), 2022 Pranay Manocha Anurag Kumar 261 36 0 24 Jun 2022
Masked Autoencoders Are Scalable Vision LearnersComputer Vision and Pattern Recognition (CVPR), 2021 Kaiming He Xinlei Chen Saining Xie Yanghao Li Piotr Dollár Ross B. Girshick ViT TPM 1.5K 9,694 0 11 Nov 2021
A Comparison of Discrete and Soft Speech Units for Improved Voice ConversionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021 Benjamin van Niekerk M. Carbonneau Julian Zaïdi Matthew Baas Hugo Seuté Herman Kamper DRL 288 152 0 03 Nov 2021
DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors Chandan K. A. Reddy Vishak Gopal Ross Cutler 408 317 0 05 Oct 2021
SoundStream: An End-to-End Neural Audio Codec Neil Zeghidour Alejandro Luebs Ahmed Omran Jan Skoglund Marco Tagliasacchi AI4TS 205 1,068 0 07 Jul 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden UnitsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021 Wei-Ning Hsu Benjamin Bolte Yifan Hao Kushal Lakhotia Ruslan Salakhutdinov Abdel-rahman Mohamed SSL 460 3,851 0 14 Jun 2021
Speech Resynthesis from Discrete Disentangled Self-Supervised RepresentationsInterspeech (Interspeech), 2021 Adam Polyak Yossi Adi Jade Copet Eugene Kharitonov Kushal Lakhotia Wei-Ning Hsu Abdel-rahman Mohamed Emmanuel Dupoux 336 361 0 01 Apr 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 1.2K 52,903 0 22 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong Jaehyeon Kim Jaekyoung Bae 405 2,361 0 12 Oct 2020
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech EnhancementInterspeech (Interspeech), 2020 Yanxin Hu Yun Liu Shubo Lv Mengtao Xing Shimin Zhang Yihui Fu Jian Wu Bihong Zhang Lei Xie 409 704 0 01 Aug 2020
Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech SeparationInterspeech (Interspeech), 2020 Jing-jing Chen Qi-rong Mao Dong Liu 294 324 0 28 Jul 2020
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification Brecht Desplanques Jenthe Thienpondt Kris Demuynck 216 1,652 0 14 May 2020
WHAM!: Extending Speech Separation to Noisy EnvironmentsInterspeech (Interspeech), 2019 Gordon Wichern J. Antognini Michael Flynn Licheng Richard Zhu E. McQuinn Dwight Crow Ethan Manilow Jonathan Le Roux 178 426 0 02 Jul 2019
Generating Diverse High-Fidelity Images with VQ-VAE-2Neural Information Processing Systems (NeurIPS), 2019 Ali Razavi Aaron van den Oord Oriol Vinyals DRL BDL 471 2,083 0 02 Jun 2019
Phase-aware Speech Enhancement with Deep Complex U-Net Hyeong-Seok Choi Jang-Hyun Kim Jaesung Huh A. Kim Jung-Woo Ha Kyogu Lee 212 369 0 07 Mar 2019
SDR - half-baked or well done?IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018 F. Sánchez-Martínez M. Esplà-Gomis Hakan Erdogan J. Hershey 292 1,448 0 06 Nov 2018
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation Yi Luo N. Mesgarani 421 2,000 0 20 Sep 2018
CREPE: A Convolutional Representation for Pitch Estimation Jong Wook Kim Justin Salamon P. Li J. P. Bello 200 430 0 17 Feb 2018
Neural Discrete Representation Learning Aaron van den Oord Oriol Vinyals Koray Kavukcuoglu BDL SSL OCL 539 6,157 0 02 Nov 2017
Generalized End-to-End Loss for Speaker VerificationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2017 Li Wan Quan Wang Alan Papir Ignacio López Moreno VLM 341 1,008 0 28 Oct 2017
Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017 Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 2.3K 156,353 0 12 Jun 2017