ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.06670
  4. Cited By
Common Voice: A Massively-Multilingual Speech Corpus

Common Voice: A Massively-Multilingual Speech Corpus

13 December 2019
Rosana Ardila
Megan Branson
Kelly Davis
Michael Henretty
M. Kohler
Josh Meyer
Reuben Morais
Lindsay Saunders
Francis M. Tyers
Gregor Weber
    VLM
ArXivPDFHTML

Papers citing "Common Voice: A Massively-Multilingual Speech Corpus"

50 / 316 papers shown
Title
WER We Stand: Benchmarking Urdu ASR Models
WER We Stand: Benchmarking Urdu ASR Models
Samee Arif
Aamina Jamal Khan
Mustafa Abbas
Agha Ali Raza
Awais Athar
26
3
0
17 Sep 2024
ASR Error Correction using Large Language Models
ASR Error Correction using Large Language Models
Rao Ma
Mengjie Qian
Mark Gales
Kate Knill
KELM
46
1
0
14 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
C. Han
Seokgi Lee
Gyuhyeon Nam
Gyeongsu Chae
DiffM
215
0
0
14 Sep 2024
Keyword-Aware ASR Error Augmentation for Robust Dialogue State Tracking
Keyword-Aware ASR Error Augmentation for Robust Dialogue State Tracking
Jihyun Lee
Solee Im
Wonjun Lee
Gary Geunbae Lee
36
0
0
10 Sep 2024
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training
  for Enhanced Speech Recognition and Translation
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Nithin Rao Koluguri
Travis M. Bartley
Hainan Xu
Oleksii Hrinchuk
Jagadeesh Balam
Boris Ginsburg
Georg Kucsko
44
3
0
09 Sep 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
60
36
0
29 Aug 2024
Enhancing Large Language Model-based Speech Recognition by
  Contextualization for Rare and Ambiguous Words
Enhancing Large Language Model-based Speech Recognition by Contextualization for Rare and Ambiguous Words
Kento Nozawa
Takashi Masuko
Toru Taniguchi
43
1
0
15 Aug 2024
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
Beomseok Lee
Ioan Calapodescu
Marco Gaido
Matteo Negri
Laurent Besacier
AuLLM
39
4
0
07 Aug 2024
Audio-visual training for improved grounding in video-text LLMs
Audio-visual training for improved grounding in video-text LLMs
Shivprasad Sagare
Hemachandran S
Kinshuk Sarabhai
Prashant Ullegaddi
SA Rajeshkumar
32
0
0
21 Jul 2024
Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors
Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors
J. Hauret
Malo Olivier
Thomas Joubaud
C. Langrenne
Sarah Poirée
V. Zimpfer
Éric Bavu
85
1
0
16 Jul 2024
Multitaper mel-spectrograms for keyword spotting
Multitaper mel-spectrograms for keyword spotting
Douglas Baptista de Souza
Khaled Jamal Bakri
Fernanda Ferreira
Juliana Inacio
18
1
0
05 Jul 2024
Cross-Lingual Transfer Learning for Speech Translation
Cross-Lingual Transfer Learning for Speech Translation
Rao Ma
Yassir Fathullah
Mengjie Qian
Siyuan Tang
Mark Gales
Kate Knill
28
1
0
01 Jul 2024
Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech
  Translation System for IWSLT 2024
Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024
Sai Koneru
Thai-Binh Nguyen
Ngoc-Quan Pham
Danni Liu
Zhaolin Li
Alexander Waibel
Jan Niehues
OffRL
44
3
0
24 Jun 2024
AudioBench: A Universal Benchmark for Audio Large Language Models
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang
Xunlong Zou
Geyu Lin
Shri Kiran Srinivasan
Zhuohan Liu
Wenyu Zhang
Zhengyuan Liu
AiTi Aw
Nancy F. Chen
AuLLM
ELM
LM&MA
92
23
0
23 Jun 2024
Large Language Models for Dysfluency Detection in Stuttered Speech
Large Language Models for Dysfluency Detection in Stuttered Speech
Dominik Wagner
Sebastian P. Bayerl
Ilja Baumann
Korbinian Riedhammer
Elmar Nöth
Tobias Bocklet
50
4
0
16 Jun 2024
Outlier Reduction with Gated Attention for Improved Post-training
  Quantization in Large Sequence-to-sequence Speech Foundation Models
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
Dominik Wagner
Ilja Baumann
Korbinian Riedhammer
Tobias Bocklet
MQ
43
1
0
16 Jun 2024
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
Pooneh Mousavi
J. Duret
Salah Zaiem
Luca Della Libera
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
42
10
0
15 Jun 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Holy Lovenia
Rahmad Mahendra
Salsabil Maulana Akbar
Lester James V. Miranda
Jennifer Santoso
...
Genta Indra Winata
Ruochen Zhang
Fajri Koto
Zheng-Xin Yong
Samuel Cahyawijaya
95
9
0
14 Jun 2024
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling
  Constraints, Languages, and Datasets
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
Jiatong Shi
Shih-Heng Wang
William Chen
Martijn Bartelds
Vanya Bannihatti Kumar
...
Xuankai Chang
Dan Jurafsky
Karen Livescu
Hung-yi Lee
Shinji Watanabe
AuLLM
77
5
0
12 Jun 2024
Self-Supervised Speech Representations are More Phonetic than Semantic
Self-Supervised Speech Representations are More Phonetic than Semantic
Kwanghee Choi
Ankita Pasad
Tomohiko Nakamura
Satoru Fukayama
Karen Livescu
Shinji Watanabe
39
14
0
12 Jun 2024
PolySpeech: Exploring Unified Multitask Speech Models for
  Competitiveness with Single-task Models
PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models
Runyan Yang
Huibao Yang
Xiqing Zhang
Tiantian Ye
Ying Liu
Yingying Gao
Shilei Zhang
Chao Deng
Junlan Feng
36
0
0
12 Jun 2024
Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques
Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques
Yuanchao Li
Peter Bell
Catherine Lai
46
9
0
12 Jun 2024
Reading Miscue Detection in Primary School through Automatic Speech
  Recognition
Reading Miscue Detection in Primary School through Automatic Speech Recognition
Lingyun Gao
Cristian Tejedor-García
H. Strik
C. Cucchiarini
37
0
0
11 Jun 2024
AudioMarkBench: Benchmarking Robustness of Audio Watermarking
AudioMarkBench: Benchmarking Robustness of Audio Watermarking
Hongbin Liu
Moyang Guo
Zhengyuan Jiang
Lun Wang
Neil Zhenqiang Gong
39
6
0
11 Jun 2024
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
Zheshu Song
Jianheng Zhuo
Yifan Yang
Ziyang Ma
Shixiong Zhang
Xie Chen
36
9
0
07 Jun 2024
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Saierdaer Yusuyin
Te Ma
Hao Huang
Wenbo Zhao
Zhijian Ou
52
2
0
04 Jun 2024
1st Place Solution to Odyssey Emotion Recognition Challenge Task1:
  Tackling Class Imbalance Problem
1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem
Mingjie Chen
Hezhao Zhang
Yuanchao Li
Jiachen Luo
Wen Wu
...
Lin Wang
P. Woodland
Xie Chen
Huy P Phan
Thomas Hain
30
0
0
30 May 2024
Crossmodal ASR Error Correction with Discrete Speech Units
Crossmodal ASR Error Correction with Discrete Speech Units
Yuanchao Li
Pinzhen Chen
Peter Bell
Catherine Lai
36
7
0
26 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Wenhan Luo
Lin Ma
Min-Ling Zhang
MoE
46
30
0
18 May 2024
Listen Again and Choose the Right Answer: A New Paradigm for Automatic
  Speech Recognition with Large Language Models
Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
Yuchen Hu
Chen Chen
Chengwei Qin
Qiushi Zhu
Eng Siong Chng
Ruizhe Li
AuLLM
KELM
54
5
0
16 May 2024
Sonos Voice Control Bias Assessment Dataset: A Methodology for
  Demographic Bias Assessment in Voice Assistants
Sonos Voice Control Bias Assessment Dataset: A Methodology for Demographic Bias Assessment in Voice Assistants
Chloe Sekkat
Fanny Leroy
Salima Mdhaffar
Blake Perry Smith
Yannick Esteve
Joseph Dureau
A. Coucke
32
1
0
14 May 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
41
38
0
14 May 2024
Muting Whisper: A Universal Acoustic Adversarial Attack on Speech
  Foundation Models
Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models
Vyas Raina
Rao Ma
Charles G McGhee
Kate Knill
Mark Gales
AAML
33
5
0
09 May 2024
Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition
Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition
O. Kundacina
V. Vincan
D. Mišković
BDL
104
0
0
03 May 2024
TextAge: A Curated and Diverse Text Dataset for Age Classification
TextAge: A Curated and Diverse Text Dataset for Age Classification
Shravan Cheekati
Mridul Gupta
Vibha Raghu
P. Raj
22
0
0
02 May 2024
MAD Speech: Measures of Acoustic Diversity of Speech
MAD Speech: Measures of Acoustic Diversity of Speech
Matthieu Futeral
A. Agostinelli
Marco Tagliasacchi
Neil Zeghidour
Eugene Kharitonov
56
1
0
16 Apr 2024
A Large-Scale Evaluation of Speech Foundation Models
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
38
20
0
15 Apr 2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving
  Zero-Shot Voice Editing
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
Philip Anastassiou
Zhenyu Tang
Kainan Peng
Dongya Jia
Jiaxin Li
Ming Tu
Yuping Wang
Yuxuan Wang
Mingbo Ma
42
4
0
10 Apr 2024
VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain
VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain
Khai Le-Duc
LM&MA
44
9
0
08 Apr 2024
The VoicePrivacy 2024 Challenge Evaluation Plan
The VoicePrivacy 2024 Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Pierre Champion
Sarina Meyer
Xin Wang
Emmanuel Vincent
Michele Panariello
Nicholas W. D. Evans
Junichi Yamagishi
Massimiliano Todisco
41
23
0
03 Apr 2024
PhoWhisper: Automatic Speech Recognition for Vietnamese
PhoWhisper: Automatic Speech Recognition for Vietnamese
Thanh-Thien Le
L. T. Nguyen
Dat Quoc Nguyen
37
3
0
27 Mar 2024
Language and Speech Technology for Central Kurdish Varieties
Language and Speech Technology for Central Kurdish Varieties
Sina Ahmadi
Daban Q. Jaff
Md Mahfuz Ibn Alam
Antonios Anastasopoulos
39
2
0
04 Mar 2024
High-Fidelity Neural Phonetic Posteriorgrams
High-Fidelity Neural Phonetic Posteriorgrams
Cameron Churchwell
Max Morrison
Bryan Pardo
40
5
0
27 Feb 2024
Direct Punjabi to English speech translation using discrete units
Direct Punjabi to English speech translation using discrete units
Prabhjot Kaur
L. A. M. Bush
Weisong Shi
34
0
0
25 Feb 2024
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech
  Recognition, Translation, and Language Identification
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Yifan Peng
Yui Sudo
Muhammad Shakeel
Shinji Watanabe
VLM
46
17
0
20 Feb 2024
A Comprehensive Review of Machine Learning Advances on Data Change: A
  Cross-Field Perspective
A Comprehensive Review of Machine Learning Advances on Data Change: A Cross-Field Perspective
Jeng-Lin Li
Chih-Fan Hsu
Ming-Ching Chang
Wei-Chao Chen
OOD
51
2
0
20 Feb 2024
Materiality and Risk in the Age of Pervasive AI Sensors
Materiality and Risk in the Age of Pervasive AI Sensors
Matthew P. Stewart
Emanuel Moss
Pete Warden
Brian Plancher
Susan Kennedy
Mona Sloane
Vijay Janapa Reddi
19
2
0
17 Feb 2024
A Comprehensive Study of the Current State-of-the-Art in Nepali
  Automatic Speech Recognition Systems
A Comprehensive Study of the Current State-of-the-Art in Nepali Automatic Speech Recognition Systems
Rupak Raj Ghimire
B. Bal
Prakash Poudyal
19
0
0
05 Feb 2024
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on
  E-Branchformer
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Yifan Peng
Jinchuan Tian
William Chen
Siddhant Arora
Brian Yan
...
Kwanghee Choi
Jiatong Shi
Xuankai Chang
Jee-weon Jung
Shinji Watanabe
VLM
OSLM
34
40
0
30 Jan 2024
Acoustic characterization of speech rhythm: going beyond metrics with
  recurrent neural networks
Acoustic characterization of speech rhythm: going beyond metrics with recurrent neural networks
Franccois Deloche
Laurent Bonnasse-Gahot
Judit Gervain
26
0
0
22 Jan 2024
Previous
1234567
Next