ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.06670
  4. Cited By
Common Voice: A Massively-Multilingual Speech Corpus

Common Voice: A Massively-Multilingual Speech Corpus

13 December 2019
Rosana Ardila
Megan Branson
Kelly Davis
Michael Henretty
M. Kohler
Josh Meyer
Reuben Morais
Lindsay Saunders
Francis M. Tyers
Gregor Weber
    VLM
ArXivPDFHTML

Papers citing "Common Voice: A Massively-Multilingual Speech Corpus"

50 / 304 papers shown
Title
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
Bowen Zhang
Congchao Guo
Geng Yang
Hang Yu
Haozhe Zhang
...
Yichen Xiao
Yiying Zhou
Yujie Zhang
Yuan Lu
Yucen He
26
0
0
12 May 2025
Using Information Theory to Characterize Prosodic Typology: The Case of Tone, Pitch-Accent and Stress-Accent
Using Information Theory to Characterize Prosodic Typology: The Case of Tone, Pitch-Accent and Stress-Accent
E. Wilcox
Cui Ding
Giovanni Acampa
Tiago Pimentel
Alex Warstadt
Tamar I. Regev
36
0
0
12 May 2025
Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations
Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations
Linrong Pan
Chenglong Jiang
Gaoze Hou
Ying Gao
48
0
0
08 May 2025
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Xueyao Zhang
Yufei Wang
Chaoren Wang
Zehan Li
Zhuo Chen
Zhizheng Wu
158
0
0
07 May 2025
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
Shigeki Karita
Yuma Koizumi
Heiga Zen
Haruko Ishikawa
Robin Scheibler
M. Bacchiani
VLM
199
1
0
07 May 2025
BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech Recognition
BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech Recognition
Paige Tuttosi
Mantaj Dhillon
Luna Sang
Shane Eastwood
Poorvi Bhatia
Quang Minh Dinh
Avni Kapoor
Yewon Jin
Angelica Lim
34
0
0
30 Apr 2025
Kimi-Audio Technical Report
Kimi-Audio Technical Report
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
Shixuan Liu
...
Zhiyong Yang
Aoxiong Yin
Ruibin Yuan
Wenjie Qu
Zaida Zhou
AuLLM
VLM
110
5
0
25 Apr 2025
Dysarthria Normalization via Local Lie Group Transformations for Robust ASR
Dysarthria Normalization via Local Lie Group Transformations for Robust ASR
Mikhail Osipov
45
1
0
16 Apr 2025
Advancing Arabic Speech Recognition Through Large-Scale Weakly Supervised Learning
Advancing Arabic Speech Recognition Through Large-Scale Weakly Supervised Learning
Mahmoud Salhab
Marwan Elghitany
Shameed Sait
Syed Sibghat Ullah
Mohammad Abusheikh
Hasan Abusheikh
49
0
0
16 Apr 2025
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
Prabhat Pandey
R. Swaminathan
K V Vijay Girish
Arunasish Sen
Jian Xie
Grant P. Strimel
Andreas Schwarz
191
0
0
12 Apr 2025
Spatial Audio Processing with Large Language Model on Wearable Devices
Spatial Audio Processing with Large Language Model on Wearable Devices
Ayushi Mishra
Yang Bai
Priyadarshan Narayanasamy
Nakul Garg
Nirupam Roy
30
0
0
11 Apr 2025
SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM Development
SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM Development
Minghan Wang
Ye Bai
Yanjie Wang
Thuy-Trang Vu
Ehsan Shareghi
Gholamreza Haffari
52
0
0
31 Mar 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
64
0
0
11 Mar 2025
Training and Inference Efficiency of Encoder-Decoder Speech Models
Training and Inference Efficiency of Encoder-Decoder Speech Models
Piotr .Zelasko
Kunal Dhawan
Daniel Galvez
Krishna C. Puvvada
Ankita Pasad
Nithin Rao Koluguri
Ke Hu
Vitaly Lavrukhin
Jagadeesh Balam
Boris Ginsburg
48
0
0
07 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Wenjie Qu
Xiren Zhou
MoE
SyDa
78
32
0
03 Mar 2025
Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models
Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models
Rui Hu
Delai Qiu
Shuyu Wei
J.N. Zhang
Yining Wang
Shengping Liu
Jitao Sang
AuLLM
VLM
59
0
0
27 Feb 2025
Nexus-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Che Liu
Yingji Zhang
D. Zhang
Weijie Zhang
Chenggong Gong
...
André Freitas
Qifan Wang
Z. Xu
Rongjuncheng Zhang
Yong Dai
AuLLM
76
0
0
26 Feb 2025
Less is More for Synthetic Speech Detection in the Wild
Less is More for Synthetic Speech Detection in the Wild
Ashi Garg
Zexin Cai
Henry Li Xinyuan
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Matthew Wiesner
Nicholas Andrews
74
0
0
17 Feb 2025
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Yansen Wang
Kai Chen
Pengyuan Zhang
Zhikai Wu
AuLLM
66
4
0
28 Jan 2025
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding
Jiaxing Zhao
Q. Yang
Yixing Peng
Detao Bai
Shimin Yao
...
Xiang Chen
Shenghao Fu
Weixuan chen
Xihan Wei
Liefeng Bo
VGen
AuLLM
52
5
0
28 Jan 2025
FlanEC: Exploring Flan-T5 for Post-ASR Error Correction
FlanEC: Exploring Flan-T5 for Post-ASR Error Correction
Moreno La Quatra
Valerio Mario Salerno
Yu Tsao
Sabato Marco Siniscalchi
99
0
0
22 Jan 2025
BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR
BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR
Guodong Ma
Wenxuan Wang
Lifeng Zhou
Yuting Yang
Yuke Li
Binbin Du
MoE
79
0
0
22 Jan 2025
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
Jiaxi Hu
Zuchao Li
Mengjia Shen
Haojun Ai
Sheng Li
Jun Zhang
41
0
0
20 Jan 2025
GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition Systems
GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition Systems
Amin Robatian
Mohammad Hajipour
Mohammad Reza Peyghan
Fatemeh Rajabi
Sajjad Amini
Shahrokh Ghaemmaghami
Iman Gholampour
46
0
0
18 Jan 2025
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
Jingyang Zhang
Lu Lu
Yansen Wang
Haizhou Li
Zhikai Wu
AuLLM
90
19
0
17 Jan 2025
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition
Shucong Zhang
Titouan Parcollet
Rogier van Dalen
Sourav Bhattacharya
51
0
0
10 Jan 2025
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
Wei Liu
Jingyong Hou
Dong Yang
Muyong Cao
Tan Lee
80
1
0
10 Jan 2025
AccentBox: Towards High-Fidelity Zero-Shot Accent Generation
AccentBox: Towards High-Fidelity Zero-Shot Accent Generation
Jinzuomu Zhong
Korin Richmond
Zhiba Su
Siqi Sun
63
6
0
10 Jan 2025
The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language
The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language
Michael Ong
Sean Robertson
Leo Peckham
Alba Jorquera Jimenez de Aberasturi
Paula Arkhangorodsky
Robin Huo
Aman Sakhardande
Mark Hallap
Naomi Nagy
Ewan Dunbar
CVBM
47
0
0
08 Jan 2025
Methods to Increase the Amount of Data for Speech Recognition for Low Resource Languages
Methods to Increase the Amount of Data for Speech Recognition for Low Resource Languages
Alexan Ayrapetyan
Sofia Kostandian
Ara Yeroyan
Mher Yerznkanyan
Nikolay Karpov
Nune Tadevosyan
Vitaly Lavrukhin
Boris Ginsburg
66
0
0
08 Jan 2025
Fotheidil: an Automatic Transcription System for the Irish Language
Liam Lonergan
Ibon Saratxaga
John Sloan
Oscar Maharog
Mengjie Qian
Neasa Ní Chiaráin
Christer Gobl
A. N. Chasaide
33
0
0
03 Jan 2025
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
Shashi Kumar
Iuliia Thorbecke
Sergio Burdisso
Esaú Villatoro-Tello
Marcelo Errecalde
Kadri Hacioğlu
Pradeep Rangappa
P. Motlícek
A. Ganapathiraju
Andreas Stolcke
55
1
0
06 Nov 2024
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Dennis Fucci
Marco Gaido
Beatrice Savoldi
Matteo Negri
Mauro Cettolo
L. Bentivogli
57
1
0
03 Nov 2024
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
Amir Hossein Kargaran
François Yvon
Hinrich Schutze
VLM
44
5
0
31 Oct 2024
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Yifan Peng
Krishna C. Puvvada
Zhehuai Chen
Piotr .Zelasko
He Huang
Kunal Dhawan
Ke Hu
Shinji Watanabe
Jagadeesh Balam
Boris Ginsburg
64
2
0
23 Oct 2024
A Framework for Adapting Human-Robot Interaction to Diverse User Groups
A Framework for Adapting Human-Robot Interaction to Diverse User Groups
Theresa Pekarek-Rosin
Vanessa Hassouna
Xiaowen Sun
Luca Krohm
Henri-Leon Kordt
Michael Beetz
Stefan Wermter
28
0
0
15 Oct 2024
Code Drift: Towards Idempotent Neural Audio Codecs
Code Drift: Towards Idempotent Neural Audio Codecs
P. O'Reilly
Prem Seetharaman
Jiaqi Su
Zeyu Jin
Bryan Pardo
176
0
0
14 Oct 2024
The First VoicePrivacy Attacker Challenge Evaluation Plan
The First VoicePrivacy Attacker Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Emmanuel Vincent
Junichi Yamagishi
131
2
0
09 Oct 2024
Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Yi Zhu
C. Goel
Surya Koppisetti
Trang Tran
Ankur Kumar
Gaurav Bharaj
AAML
28
0
0
09 Oct 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Yushen Chen
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
35
52
0
09 Oct 2024
Variable Bitrate Residual Vector Quantization for Audio Coding
Variable Bitrate Residual Vector Quantization for Audio Coding
Yunkee Chae
Woosung Choi
Yuhta Takida
Junghyun Koo
Yukara Ikemiya
...
K. Cheuk
Marco A. Martínez-Ramírez
Kyogu Lee
Wei-Hsiang Liao
Yuki Mitsufuji
91
0
0
08 Oct 2024
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
Hainan Xu
Travis M. Bartley
Vladimir Bataev
Boris Ginsburg
205
0
0
03 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
61
17
0
01 Oct 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
60
11
0
26 Sep 2024
Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech
  Recognition
Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition
Andrés Piñeiro-Martín
C. García-Mateo
Laura Docío-Fernández
María del Carmen López-Pérez
Georg Rehm
32
3
0
25 Sep 2024
LM-assisted keyword biasing with Aho-Corasick algorithm for
  Transducer-based ASR
LM-assisted keyword biasing with Aho-Corasick algorithm for Transducer-based ASR
Iuliia Thorbecke
Juan Zuluaga-Gomez
Esaú Villatoro-Tello
Andres Carofilis
Shashi Kumar
P. Motlícek
Karthik Pandia
A. Ganapathiraju
37
0
0
20 Sep 2024
AutoMode-ASR: Learning to Select ASR Systems for Better Quality and Cost
AutoMode-ASR: Learning to Select ASR Systems for Better Quality and Cost
Ahmet Gündüz
Yunsu Kim
Kamer Ali Yuksel
Mohamed Al-Badrashiny
Thiago Castro Ferreira
Hassan Sawaf
38
0
0
19 Sep 2024
ASR Benchmarking: Need for a More Representative Conversational Dataset
ASR Benchmarking: Need for a More Representative Conversational Dataset
Gaurav Maheshwari
Dmitry Ivanov
Théo Johannet
Kevin El Haddad
30
0
0
18 Sep 2024
WER We Stand: Benchmarking Urdu ASR Models
WER We Stand: Benchmarking Urdu ASR Models
Samee Arif
Aamina Jamal Khan
Mustafa Abbas
Agha Ali Raza
Awais Athar
26
3
0
17 Sep 2024
ASR Error Correction using Large Language Models
ASR Error Correction using Large Language Models
Rao Ma
Mengjie Qian
Mark Gales
Kate Knill
KELM
46
1
0
14 Sep 2024
1234567
Next