v1v2 (latest)

VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

2 January 2021

ArXiv (abs)PDF HTML Github (536★)

Papers citing "VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation"

50 / 311 papers shown

Title
SQuId: Measuring Speech Naturalness in Many Languages Thibault Sellam Ankur Bapna Joshua Camp Diana Mackinnon Ankur P. Parikh Jason Riesa 83 18 0 12 Oct 2022
Direct Speech Translation for Automatic Subtitling Sara Papi Marco Gaido Alina Karakanta Mauro Cettolo Matteo Negri Marco Turchi 102 11 0 27 Sep 2022
Bangla-Wave: Improving Bangla Automatic Speech Recognition Utilizing N-gram Language Models Mohammed Rakib Md. Ismail Hossain Nabeel Mohammed Fuad Rahman VLM 85 7 0 13 Sep 2022
Are disentangled representations all you need to build speaker anonymization systems? Pierre Champion D. Jouvet Anthony Larcher 113 20 0 22 Aug 2022
TEVR: Improving Speech Recognition by Token Entropy Variance Reduction Hajo N. Krabbenhöft Erhardt Barth 72 3 0 25 Jun 2022
BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping Gasser Elbanna Neil Scheidwasser M. Kegler P. Beckmann Karl El Hajal Milos Cernak SSL 92 23 0 24 Jun 2022
Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project Jan Lehecka J. Psutka Josef Psutka 51 4 0 15 Jun 2022
Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech Jan Lehecka J. Svec A. Pražák J. Psutka 52 13 0 15 Jun 2022
The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task Ziqiang Zhang Junyi Ao Long Zhou Shujie Liu Furu Wei Jinyu Li 36 9 0 12 Jun 2022
Toward a realistic model of speech processing in the brain with self-supervised learning Juliette Millet Charlotte Caucheteux Pierre Orhan Yves Boubenec Alexandre Gramfort Ewan Dunbar Christophe Pallier J. King 112 99 0 03 Jun 2022
Predicting non-native speech perception using the Perceptual Assimilation Model and state-of-the-art acoustic models Juliette Millet I. Chitoran Ewan Dunbar 59 6 0 31 May 2022
Do self-supervised speech models develop human-like perception biases? Juliette Millet Ewan Dunbar SSL 68 23 0 31 May 2022
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech Alexis Conneau Min Ma Simran Khanuja Yu Zhang Vera Axelrod Siddharth Dalmia Jason Riesa Clara E. Rivera Ankur Bapna VLM 155 332 0 25 May 2022
T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation Paul-Ambroise Duquenne Hongyu Gong Benoît Sagot Holger Schwenk 85 20 0 24 May 2022
Self-Supervised Speech Representation Learning: A Review Abdel-rahman Mohamed Hung-yi Lee Lasse Borgholt Jakob Drachmann Havtorn Joakim Edin ... Shang-Wen Li Karen Livescu Lars Maaløe Tara N. Sainath Shinji Watanabe SSL AI4TS 285 368 0 21 May 2022
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation Sameer Khurana Antoine Laurent James R. Glass 65 37 0 17 May 2022
Hearing voices at the National Library -- a speech corpus and acoustic model for the Swedish language Martin Malmsten Chris Haffenden Love Borjeson 66 10 0 06 May 2022
Efficient yet Competitive Speech Translation: FBK@IWSLT2022 Marco Gaido Sara Papi Dennis Fucci G. Fiameni Matteo Negri Marco Turchi 69 20 0 05 May 2022
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages Felix Wu Kwangyoun Kim Shinji Watanabe Kyu Jeong Han Ryan T. McDonald Kilian Q. Weinberger Yoav Artzi SyDa 105 39 0 02 May 2022
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition? Sanyuan Chen Yu Wu Chengyi Wang Shujie Liu Zhuo Chen ... Gang Liu Jinyu Li Jian Wu Xiangzhan Yu Furu Wei SSL 95 42 0 27 Apr 2022
LibriS2S: A German-English Speech-to-Speech Translation Corpus Pedro Jeuris Jan Niehues AuLLM 29 3 0 22 Apr 2022
ASR in German: A Detailed Error Analysis John M. Wirth René Peinl 55 6 0 12 Apr 2022
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance Lin Zhang Xin Wang Erica Cooper Nicholas W. D. Evans Junichi Yamagishi 109 60 0 11 Apr 2022
MAESTRO: Matched Speech Text Representations through Modality Matching Zhehuai Chen Yu Zhang Andrew Rosenberg Bhuvana Ramabhadran Pedro J. Moreno Ankur Bapna Heiga Zen 94 108 0 07 Apr 2022
Speech Pre-training with Acoustic Piece Shuo Ren Shujie Liu Yu Wu Long Zhou Furu Wei SSL 65 17 0 07 Apr 2022
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation Sravya Popuri Peng-Jen Chen Changhan Wang J. Pino Yossi Adi Jiatao Gu Wei-Ning Hsu Ann Lee 142 58 0 06 Apr 2022
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation Xuankai Chang Takashi Maekaku Yuya Fujita Shinji Watanabe VLM 111 46 0 01 Apr 2022
ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion Edresson Casanova C. Shulby Alexander Korolev Arnaldo Cândido Júnior A. S. Soares S. Aluísio M. Ponti 117 14 0 29 Mar 2022
Finnish Parliament ASR corpus - Analysis, benchmarks and statistics A. Virkkunen Aku Rouhe Nhan Phan M. Kurimo 95 4 0 28 Mar 2022
Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation Ye Jia Yifan Ding Ankur Bapna Colin Cherry Yu Zhang Alexis Conneau Nobuyuki Morioka 94 21 0 24 Mar 2022
Lahjoita puhetta -- a large-scale corpus of spoken Finnish with some benchmarks Anssi Moisio Dejan Porjazovski Aku Rouhe Yaroslav Getman A. Virkkunen Tamás Grósz Krister Lindén M. Kurimo 88 23 0 24 Mar 2022
The VoicePrivacy 2022 Challenge Evaluation Plan N. Tomashenko Xin Wang Xiaoxiao Miao Hubert Nourtel Pierre Champion Massimiliano Todisco Emmanuel Vincent Nicholas W. D. Evans Junichi Yamagishi J. Bonastre 117 63 0 23 Mar 2022
XTREME-S: Evaluating Cross-lingual Speech Representations Alexis Conneau Ankur Bapna Yu Zhang Min Ma Patrick von Platen ... Orhan Firat Michael Auli Sebastian Ruder Jason Riesa Melvin Johnson VLM AILaw ELM 155 22 0 21 Mar 2022
Dawn of the transformer era in speech emotion recognition: closing the valence gap Johannes Wagner Andreas Triantafyllopoulos H. Wierstorf Maximilian Schmitt Felix Burkhardt F. Eyben Björn W. Schuller 96 306 0 14 Mar 2022
Building and curating conversational corpora for diversity-aware language science and technology Andreas Liesenfeld Mark Dingemanse 50 4 0 07 Mar 2022
HEAR: Holistic Evaluation of Audio Representations Joseph P. Turian Jordie Shier H. Khan Bhiksha Raj Björn W. Schuller ... P. Esling Pranay Manocha Shinji Watanabe Zeyu Jin Yonatan Bisk 135 108 0 06 Mar 2022
Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation Hemlata Tak Massimiliano Todisco Xin Wang Jee-weon Jung Junichi Yamagishi Nicholas W. D. Evans 129 168 0 24 Feb 2022
mSLAM: Massively multilingual joint pre-training for speech and text Ankur Bapna Colin Cherry Yu Zhang Ye Jia Melvin Johnson Yong Cheng Simran Khanuja Jason Riesa Alexis Conneau VLM 73 114 0 03 Feb 2022
BEA-Base: A Benchmark for ASR of Spontaneous Hungarian P. Mihajlik A. Balog T. E. Gráczi A. Kohári Balázs Tarján K. Mády 44 8 0 01 Feb 2022
CVSS Corpus and Massively Multilingual Speech-to-Speech Translation Yeting Jia Michelle Tadmor Ramanovich Quan Wang Heiga Zen SLR 94 70 0 11 Jan 2022
Textless Speech-to-Speech Translation on Real Data Ann Lee Hongyu Gong Paul-Ambroise Duquenne Holger Schwenk Peng-Jen Chen ... Sravya Popuri Yossi Adi J. Pino Jiatao Gu Wei-Ning Hsu 94 150 0 15 Dec 2021
On the Use of External Data for Spoken Named Entity Recognition Ankita Pasad Felix Wu Suwon Shon Karen Livescu Kyu Jeong Han 95 16 0 14 Dec 2021
Human-Machine Interaction Speech Corpus from the ROBIN project V. Pais Radu Ion Andrei-Marius Avram Elena Irimia V. Mititelu Maria Mitrofan 58 6 0 22 Nov 2021
SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech Suwon Shon Ankita Pasad Felix Wu Pablo Brusco Yoav Artzi Karen Livescu Kyu Jeong Han AuLLM ELM 106 76 0 19 Nov 2021
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale Arun Babu Changhan Wang Andros Tjandra Kushal Lakhotia Qiantong Xu ... Yatharth Saraf J. Pino Alexei Baevski Alexis Conneau Michael Auli SSL 114 712 0 17 Nov 2021
Towards Building ASR Systems for the Next Billion Users Tahir Javed Sumanth Doddapaneni A. Raman Kaushal Bhogale Gowtham Ramesh Anoop Kunchukuttan Pratyush Kumar Mitesh M. Khapra 84 55 0 06 Nov 2021
Pseudo-Labeling for Massively Multilingual Speech Recognition Loren Lugosch Tatiana Likhomanenko Gabriel Synnaeve R. Collobert VLM 77 30 0 30 Oct 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing Sanyuan Chen Chengyi Wang Zhengyang Chen Yu-Huan Wu Shujie Liu ... Yao Qian Jian Wu Micheal Zeng Xiangzhan Yu Furu Wei SSL 294 1,911 0 26 Oct 2021
ASR4REAL: An extended benchmark for speech models M. Rivière Jade Copet Gabriel Synnaeve AuLLM 78 15 0 16 Oct 2021
Scribosermo: Fast Speech-to-Text models for German and other Languages Daniel Bermuth Alexander Poeppel Wolfgang Reif 54 9 0 15 Oct 2021