ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.00390
  4. Cited By
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
  Learning, Semi-Supervised Learning and Interpretation
v1v2 (latest)

VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

2 January 2021
Changhan Wang
M. Rivière
Ann Lee
Anne Wu
Chaitanya Talnikar
Daniel Haziza
Mary Williamson
J. Pino
Emmanuel Dupoux
    SSL
ArXiv (abs)PDFHTMLGithub (536★)

Papers citing "VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation"

50 / 311 papers shown
Title
Augmenting Polish Automatic Speech Recognition System With Synthetic
  Data
Augmenting Polish Automatic Speech Recognition System With Synthetic Data
Łukasz Bondaruk
Jakub Kubiak
Mateusz Czyżnikiewicz
64
0
0
30 Oct 2024
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Qinglin Zhang
Luyao Cheng
Chong Deng
Qian Chen
Wen Wang
...
Jiaqing Liu
Hai Yu
Chaohong Tan
Zhihao Du
Shiliang Zhang
SyDaBDLAuLLMVLM
143
20
0
23 Oct 2024
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Yifan Peng
Krishna Puvvada
Zhehuai Chen
Piotr .Zelasko
He Huang
Kunal Dhawan
Ke Hu
Shinji Watanabe
Jagadeesh Balam
Boris Ginsburg
166
5
0
23 Oct 2024
DENOASR: Debiasing ASRs through Selective Denoising
DENOASR: Debiasing ASRs through Selective Denoising
Anand Rai
S. Jaiswal
Shubham Prakash
Bendi Pragnya Sree
Animesh Mukherjee
79
0
0
22 Oct 2024
End-to-End Integration of Speech Emotion Recognition with Voice Activity
  Detection using Self-Supervised Learning Features
End-to-End Integration of Speech Emotion Recognition with Voice Activity Detection using Self-Supervised Learning Features
Natsuo Yamashita
Masaaki Yamamoto
Yohei Kawaguchi
75
0
0
17 Oct 2024
Sound Check: Auditing Audio Datasets
Sound Check: Auditing Audio Datasets
William Agnew
Julia Barnett
Annie Chu
Rachel Hong
Michael Feffer
Robin Netzorg
Harry H. Jiang
Ezra Awumey
Sauvik Das
120
1
0
17 Oct 2024
Investigation of Speaker Representation for Target-Speaker Speech
  Processing
Investigation of Speaker Representation for Target-Speaker Speech Processing
Takanori Ashihara
Takafumi Moriya
Shota Horiguchi
Junyi Peng
Tsubasa Ochiai
Marc Delcroix
Kohei Matsuura
Hiroshi Sato
55
1
0
15 Oct 2024
Code Drift: Towards Idempotent Neural Audio Codecs
Code Drift: Towards Idempotent Neural Audio Codecs
P. O'Reilly
Prem Seetharaman
Jiaqi Su
Zeyu Jin
Bryan Pardo
440
1
0
14 Oct 2024
Efficiently Identifying Low-Quality Language Subsets in Multilingual
  Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset
Efficiently Identifying Low-Quality Language Subsets in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset
Farhan Samir
Emily P. Ahn
Shreya Prakash
Márton Soskuthy
Vered Shwartz
Jian Zhu
66
1
0
05 Oct 2024
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
Hainan Xu
Travis M. Bartley
Vladimir Bataev
Boris Ginsburg
426
0
0
03 Oct 2024
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation
  Model Training on EU Languages
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Marco Gaido
Sara Papi
L. Bentivogli
Alessio Brutti
Mauro Cettolo
R. Gretter
M. Matassoni
Mohamed Nabih
Matteo Negri
77
6
0
01 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
195
25
0
01 Oct 2024
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in
  Any-to-One Voice Conversion
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion
Giuseppe Ruggiero
Matteo Testa
Jurgen Van de Walle
Luigi Di Caro
61
1
0
25 Sep 2024
Whisper in Medusa's Ear: Multi-head Efficient Decoding for
  Transformer-based ASR
Whisper in Medusa's Ear: Multi-head Efficient Decoding for Transformer-based ASR
Yael Segal-Feldman
Aviv Shamsian
Aviv Navon
Gill Hetz
Joseph Keshet
47
2
0
24 Sep 2024
Semi-supervised Learning For Robust Speech Evaluation
Semi-supervised Learning For Robust Speech Evaluation
Huayun Zhang
Jeremy H. M. Wong
Geyu Lin
Nancy F. Chen
61
0
0
23 Sep 2024
Exploring the Impact of Data Quantity on ASR in Extremely Low-resource
  Languages
Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages
Yao-Fei Cheng
Li-Wei Chen
Hung-Shin Lee
Hsin-Min Wang
61
1
0
13 Sep 2024
WhisperNER: Unified Open Named Entity and Speech Recognition
WhisperNER: Unified Open Named Entity and Speech Recognition
Gil Ayache
Menachem Pirchi
Aviv Navon
Aviv Shamsian
Gill Hetz
Joseph Keshet
57
1
0
12 Sep 2024
The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language
The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language
Michael Ong
Sean Robertson
Leo Peckham
Alba Jorquera Jimenez de Aberasturi
Paula Arkhangorodsky
Robin Huo
Aman Sakhardande
Mark Hallap
Naomi Nagy
Ewan Dunbar
CVBM
175
0
0
12 Sep 2024
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training
  for Enhanced Speech Recognition and Translation
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Nithin Rao Koluguri
Travis M. Bartley
Hainan Xu
Oleksii Hrinchuk
Jagadeesh Balam
Boris Ginsburg
Georg Kucsko
85
3
0
09 Sep 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech
  Processing Tasks
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
He Huang
Taejin Park
Kunal Dhawan
Ivan Medennikov
Krishna Puvvada
Nithin Rao Koluguri
Weiqing Wang
Jagadeesh Balam
Boris Ginsburg
SSLAI4TS
79
1
0
23 Aug 2024
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech
  Processing Tasks
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks
Nakamasa Inoue
Shinta Otake
Takumi Hirose
Masanari Ohi
Rei Kawakami
74
2
0
28 Jul 2024
Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech
  SpeechT5 Model
Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model
Jan Lehecka
Z. Hanzlícek
J. Matousek
Daniel Tihelka
66
0
0
24 Jul 2024
Framework for Curating Speech Datasets and Evaluating ASR Systems: A
  Case Study for Polish
Framework for Curating Speech Datasets and Evaluating ASR Systems: A Case Study for Polish
Mohamed Allam
73
0
0
18 Jul 2024
Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors
Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors
J. Hauret
Malo Olivier
Thomas Joubaud
C. Langrenne
Sarah Poirée
V. Zimpfer
Éric Bavu
185
5
0
16 Jul 2024
Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in
  Tunisian Dialect
Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect
Salima Mdhaffar
Haroun Elleuch
Fethi Bougares
Yannick Esteve
120
1
0
05 Jul 2024
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
Bohan Li
Feiyu Shen
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
102
2
0
04 Jul 2024
Finetuning End-to-End Models for Estonian Conversational Spoken Language
  Translation
Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation
Tiia Sildam
Andra Velve
Tanel Alumäe
102
0
0
04 Jul 2024
Investigating Decoder-only Large Language Models for Speech-to-text
  Translation
Investigating Decoder-only Large Language Models for Speech-to-text Translation
Chao-Wei Huang
Hui Lu
Hongyu Gong
Hirofumi Inaguma
Ilia Kulikov
Ruslan Mavlyutov
Sravya Popuri
AuLLMLRM
98
8
0
03 Jul 2024
Towards Robust Speech Representation Learning for Thousands of Languages
Towards Robust Speech Representation Learning for Thousands of Languages
William Chen
Wangyou Zhang
Yifan Peng
Xinjian Li
Jinchuan Tian
Jiatong Shi
Xuankai Chang
Soumi Maiti
Karen Livescu
Shinji Watanabe
ELM
137
19
0
30 Jun 2024
WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech
  Health Diagnostic Model
WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model
Yi Zhu
Tiago H. Falk
MedIm
80
1
0
26 Jun 2024
MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of
  Transcribed Audio for Speech Recognition Research
MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research
Song Li
Yongbin You
Xuezhi Wang
Zhengkun Tian
Ke Ding
Guanglu Wan
43
3
0
26 Jun 2024
Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual
  Text-to-Speech Adaptation
Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation
Yingting Li
Ambuj Mehrish
Bryan Chew
Bo Cheng
Soujanya Poria
66
0
0
25 Jun 2024
Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech
  Translation System for IWSLT 2024
Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024
Sai Koneru
Thai-Binh Nguyen
Ngoc-Quan Pham
Danni Liu
Zhaolin Li
Alexander Waibel
Jan Niehues
OffRL
68
4
0
24 Jun 2024
One-Class Learning with Adaptive Centroid Shift for Audio Deepfake
  Detection
One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection
Hyun Myung Kim
Kangwook Jang
Hoirin Kim
74
7
0
24 Jun 2024
Speech Analysis of Language Varieties in Italy
Speech Analysis of Language Varieties in Italy
Moreno La Quatra
Alkis Koudounas
Elena Baralis
Sabato Marco Siniscalchi
103
3
0
22 Jun 2024
Outlier Reduction with Gated Attention for Improved Post-training
  Quantization in Large Sequence-to-sequence Speech Foundation Models
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
Dominik Wagner
Ilja Baumann
Korbinian Riedhammer
Tobias Bocklet
MQ
58
1
0
16 Jun 2024
On the Evaluation of Speech Foundation Models for Spoken Language
  Understanding
On the Evaluation of Speech Foundation Models for Spoken Language Understanding
Siddhant Arora
Ankita Pasad
Chung-Ming Chien
Jionghao Han
Roshan S. Sharma
...
William Chen
Suwon Shon
Hung-yi Lee
Karen Livescu
Shinji Watanabe
ELM
87
6
0
14 Jun 2024
On the Effects of Heterogeneous Data Sources on Speech-to-Text
  Foundation Models
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models
Jinchuan Tian
Yifan Peng
William Chen
Kwanghee Choi
Karen Livescu
Shinji Watanabe
64
7
0
13 Jun 2024
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of
  Progress in Speech Emotion Recognition
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition
Andreas Triantafyllopoulos
A. Batliner
Simon Rampp
M. Milling
Björn Schuller
VLM
65
1
0
10 Jun 2024
mHuBERT-147: A Compact Multilingual HuBERT Model
mHuBERT-147: A Compact Multilingual HuBERT Model
Marcely Zanon Boito
Vivek Iyer
Nikolaos Lagos
Laurent Besacier
Ioan Calapodescu
VLM
143
20
0
10 Jun 2024
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Guanrou Yang
Ziyang Ma
Fan Yu
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
97
5
0
09 Jun 2024
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Avihu Dekel
Raul Fernandez
78
2
0
08 Jun 2024
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
Zheshu Song
Jianheng Zhuo
Yifan Yang
Ziyang Ma
Shixiong Zhang
Xie Chen
77
11
0
07 Jun 2024
LipGER: Visually-Conditioned Generative Error Correction for Robust
  Automatic Speech Recognition
LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Sreyan Ghosh
Sonal Kumar
Ashish Seth
Purva Chiniya
Utkarsh Tyagi
R. Duraiswami
Dinesh Manocha
99
0
0
06 Jun 2024
Keyword-Guided Adaptation of Automatic Speech Recognition
Keyword-Guided Adaptation of Automatic Speech Recognition
Aviv Shamsian
Aviv Navon
Neta Glazer
Gill Hetz
Joseph Keshet
97
1
0
04 Jun 2024
TransVIP: Speech to Speech Translation System with Voice and Isochrony
  Preservation
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
Chenyang Le
Yao Qian
Dongmei Wang
Long Zhou
Shujie Liu
...
Midia Yousefi
Yanmin Qian
Jinyu Li
Sheng Zhao
Michael Zeng
86
3
0
28 May 2024
Federating Dynamic Models using Early-Exit Architectures for Automatic
  Speech Recognition on Heterogeneous Clients
Federating Dynamic Models using Early-Exit Architectures for Automatic Speech Recognition on Heterogeneous Clients
Mohamed Nabih Ali
Alessio Brutti
Daniele Falavigna
70
0
0
27 May 2024
Multilingual Prosody Transfer: Comparing Supervised & Transfer Learning
Multilingual Prosody Transfer: Comparing Supervised & Transfer Learning
Arnav Goel
Medha Hira
Anubha Gupta
57
1
0
23 May 2024
CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer
  Learning
CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning
Medha Hira
Arnav Goel
Anubha Gupta
56
1
0
23 May 2024
DiffNorm: Self-Supervised Normalization for Non-autoregressive
  Speech-to-speech Translation
DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation
Weiting Tan
Jingyu Zhang
Lingfeng Shen
Daniel Khashabi
Philipp Koehn
84
0
0
22 May 2024
Previous
1234567
Next