ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.05332
  4. Cited By
AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder

AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
10 January 2025
Samir Sadok
Simon Leglaive
Laurent Girin
Gaël Richard
Xavier Alameda-Pineda
ArXiv (abs)PDFHTML

Papers citing "AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder"

26 / 26 papers shown
Title
U-DREAM: Unsupervised Dereverberation guided by a Reverberation Model
U-DREAM: Unsupervised Dereverberation guided by a Reverberation Model
Louis Bahrman
Mathieu Fontaine
Gaël Richard
81
0
0
17 Jul 2025
RPRA-ADD: Forgery Trace Enhancement-Driven Audio Deepfake Detection
RPRA-ADD: Forgery Trace Enhancement-Driven Audio Deepfake Detection
Ruibo Fu
Xiaopeng Wang
Zhengqi Wen
Jianhua Tao
Yuankun Xie
...
Chunyu Qiang
Zhengqi Wen
Cunhang Fan
Chenxing Li
Guanjun Li
178
1
0
31 May 2025
A vector quantized masked autoencoder for audiovisual speech emotion recognition
A vector quantized masked autoencoder for audiovisual speech emotion recognitionComputer Vision and Image Understanding (CVIU), 2023
Samir Sadok
Simon Leglaive
Renaud Séguier
SSL
363
10
0
05 May 2023
High Fidelity Neural Audio Compression
High Fidelity Neural Audio Compression
Alexandre Défossez
Jade Copet
Gabriel Synnaeve
Yossi Adi
196
948
0
24 Oct 2022
Brouhaha: multi-task training for voice activity detection,
  speech-to-noise ratio, and C50 room acoustics estimation
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimationAutomatic Speech Recognition & Understanding (ASRU), 2022
Marvin Lavechin
Marianne Métais
Hadrien Titeux
Alodie Boissonnet
Jade Copet
M. Rivière
Elika Bergelson
Alejandrina Cristià
Emmanuel Dupoux
H. Bredin
251
39
0
24 Oct 2022
Speech Quality Assessment through MOS using Non-Matching References
Speech Quality Assessment through MOS using Non-Matching ReferencesInterspeech (Interspeech), 2022
Pranay Manocha
Anurag Kumar
261
36
0
24 Jun 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision LearnersComputer Vision and Pattern Recognition (CVPR), 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViTTPM
1.5K
9,694
0
11 Nov 2021
A Comparison of Discrete and Soft Speech Units for Improved Voice
  Conversion
A Comparison of Discrete and Soft Speech Units for Improved Voice ConversionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Benjamin van Niekerk
M. Carbonneau
Julian Zaïdi
Matthew Baas
Hugo Seuté
Herman Kamper
DRL
288
152
0
03 Nov 2021
DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric
  to Evaluate Noise Suppressors
DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors
Chandan K. A. Reddy
Vishak Gopal
Ross Cutler
408
317
0
05 Oct 2021
SoundStream: An End-to-End Neural Audio Codec
SoundStream: An End-to-End Neural Audio Codec
Neil Zeghidour
Alejandro Luebs
Ahmed Omran
Jan Skoglund
Marco Tagliasacchi
AI4TS
205
1,068
0
07 Jul 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden UnitsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021
Wei-Ning Hsu
Benjamin Bolte
Yifan Hao
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
460
3,851
0
14 Jun 2021
Speech Resynthesis from Discrete Disentangled Self-Supervised
  Representations
Speech Resynthesis from Discrete Disentangled Self-Supervised RepresentationsInterspeech (Interspeech), 2021
Adam Polyak
Yossi Adi
Jade Copet
Eugene Kharitonov
Kushal Lakhotia
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
336
361
0
01 Apr 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
1.2K
52,903
0
22 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
405
2,361
0
12 Oct 2020
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech
  Enhancement
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech EnhancementInterspeech (Interspeech), 2020
Yanxin Hu
Yun Liu
Shubo Lv
Mengtao Xing
Shimin Zhang
Yihui Fu
Jian Wu
Bihong Zhang
Lei Xie
409
704
0
01 Aug 2020
Dual-Path Transformer Network: Direct Context-Aware Modeling for
  End-to-End Monaural Speech Separation
Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech SeparationInterspeech (Interspeech), 2020
Jing-jing Chen
Qi-rong Mao
Dong Liu
294
324
0
28 Jul 2020
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in
  TDNN Based Speaker Verification
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
Brecht Desplanques
Jenthe Thienpondt
Kris Demuynck
216
1,652
0
14 May 2020
WHAM!: Extending Speech Separation to Noisy Environments
WHAM!: Extending Speech Separation to Noisy EnvironmentsInterspeech (Interspeech), 2019
Gordon Wichern
J. Antognini
Michael Flynn
Licheng Richard Zhu
E. McQuinn
Dwight Crow
Ethan Manilow
Jonathan Le Roux
178
426
0
02 Jul 2019
Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2Neural Information Processing Systems (NeurIPS), 2019
Ali Razavi
Aaron van den Oord
Oriol Vinyals
DRLBDL
471
2,083
0
02 Jun 2019
Phase-aware Speech Enhancement with Deep Complex U-Net
Hyeong-Seok Choi
Jang-Hyun Kim
Jaesung Huh
A. Kim
Jung-Woo Ha
Kyogu Lee
212
369
0
07 Mar 2019
SDR - half-baked or well done?
SDR - half-baked or well done?IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018
F. Sánchez-Martínez
M. Esplà-Gomis
Hakan Erdogan
J. Hershey
292
1,448
0
06 Nov 2018
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for
  Speech Separation
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
Yi Luo
N. Mesgarani
421
2,000
0
20 Sep 2018
CREPE: A Convolutional Representation for Pitch Estimation
CREPE: A Convolutional Representation for Pitch Estimation
Jong Wook Kim
Justin Salamon
P. Li
J. P. Bello
200
430
0
17 Feb 2018
Neural Discrete Representation Learning
Neural Discrete Representation Learning
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDLSSLOCL
539
6,157
0
02 Nov 2017
Generalized End-to-End Loss for Speaker Verification
Generalized End-to-End Loss for Speaker VerificationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2017
Li Wan
Quan Wang
Alan Papir
Ignacio López Moreno
VLM
341
1,008
0
28 Oct 2017
Attention Is All You Need
Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
2.3K
156,353
0
12 Jun 2017
1