ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.00390
  4. Cited By
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
  Learning, Semi-Supervised Learning and Interpretation
v1v2 (latest)

VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

2 January 2021
Changhan Wang
M. Rivière
Ann Lee
Anne Wu
Chaitanya Talnikar
Daniel Haziza
Mary Williamson
J. Pino
Emmanuel Dupoux
    SSL
ArXiv (abs)PDFHTMLGithub (536★)

Papers citing "VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation"

50 / 311 papers shown
Title
Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning
Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning
Giuseppe Attanasio
Sonal Sannigrahi
Ben Peters
André F. T. Martins
AuLLM
23
0
0
20 Jun 2025
Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching
Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching
Shoutrik Das
Nishant Singh
Arjun Gangwar
S. Umesh
12
0
0
19 Jun 2025
Watermarking Autoregressive Image Generation
Watermarking Autoregressive Image Generation
Nikola Jovanović
Ismail Labiad
Tomáš Souček
Martin Vechev
Pierre Fernandez
WIGM
31
0
0
19 Jun 2025
Factorized RVQ-GAN For Disentangled Speech Tokenization
Factorized RVQ-GAN For Disentangled Speech Tokenization
Sameer Khurana
Dominik Klement
Antoine Laurent
Dominik Bobos
Juraj Novosad
...
Ryo Aihara
Chiori Hori
François Germain
Gordon Wichern
Jonathan Le Roux
22
0
0
18 Jun 2025
CMU's IWSLT 2025 Simultaneous Speech Translation System
CMU's IWSLT 2025 Simultaneous Speech Translation System
Siqi Ouyang
Xi Xu
Lei Li
VLM
20
0
0
16 Jun 2025
Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering
Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering
Alexis Plaquet
Naohiro Tawara
Marc Delcroix
Shota Horiguchi
Atsushi Ando
S. Araki
H. Bredin
44
0
0
13 Jun 2025
What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training
What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training
Marianne de Heer Kloots
Hosein Mohebbi
Charlotte Pouw
Gaofei Shen
Willem H. Zuidema
Martijn Bentum
SSL
52
0
0
01 Jun 2025
XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark
XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark
Ioan-Paul Ciobanu
Andrei Iulian Hiji
Nicolae-Cătălin Ristea
Paul Irofti
Cristian Rusu
Radu Tudor Ionescu
23
0
0
31 May 2025
Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios
Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios
Gerard I. Gállego
Oriol Pareras
Martí Cortada Garcia
Lucas Takanori
Javier Hernando
LRM
22
0
0
30 May 2025
The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence
The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence
Marco Gaido
Sara Papi
L. Bentivogli
Alessio Brutti
Mauro Cettolo
R. Gretter
M. Matassoni
Mohamed Nabih
Matteo Negri
42
0
0
29 May 2025
FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian
FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian
Sara Papi
Marco Gaido
L. Bentivogli
Alessio Brutti
Mauro Cettolo
R. Gretter
M. Matassoni
Mohamed Nabih
Matteo Negri
33
0
0
28 May 2025
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Jeongsoo Choi
Jaehun Kim
Joon Son Chung
27
0
0
27 May 2025
Loquacious Set: 25,000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use
Loquacious Set: 25,000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use
Titouan Parcollet
Yuan Tseng
Shucong Zhang
Rogier van Dalen
33
1
0
27 May 2025
Exploring Generative Error Correction for Dysarthric Speech Recognition
Exploring Generative Error Correction for Dysarthric Speech Recognition
Moreno La Quatra
Alkis Koudounas
Valerio Mario Salerno
Sabato Marco Siniscalchi
44
0
0
26 May 2025
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
Wiebke Hutiri
Mircea Cimpoi
M. Scheuerman
Victoria Matthews
Alice Xiang
167
0
0
23 May 2025
EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion
EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion
Advait Joglekar
Divyanshu Singh
Rooshil Rohit Bhatia
S. Umesh
100
0
0
22 May 2025
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition
Tianduo Wang
Lu Xu
Wei Lu
Shanbo Cheng
45
0
0
22 May 2025
Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty
Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty
Hongfei Xue
Yufeng Tang
Jun Zhang
Xuelong Geng
Lei Xie
57
0
0
22 May 2025
Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation
Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation
Yuhao Zhang
Xiangnan Ma
Kaiqi Kou
Peizhuo Liu
Weiqiao Shan
Benyou Wang
Tong Xiao
Yuxin Huang
Zhengtao Yu
Jingbo Zhu
VLM
23
0
0
21 May 2025
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits
Tiantian Feng
Jihwan Lee
Anfeng Xu
Yoonjeong Lee
Thanathai Lertpetchpun
...
Thomas Thebaud
Laureano Moro-Velazquez
D. Byrd
Najim Dehak
Shrikanth Narayanan
91
1
0
20 May 2025
Granary: Speech Recognition and Translation Dataset in 25 European Languages
Granary: Speech Recognition and Translation Dataset in 25 European Languages
Nithin Rao Koluguri
Monica Sekoyan
George Zelenfroynd
Sasha Meister
Shuoyang Ding
...
Yifan Peng
Sara Papi
Marco Gaido
Alessio Brutti
Boris Ginsburg
58
0
0
19 May 2025
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Zuwei Long
Yunhang Shen
Chaoyou Fu
Heting Gao
Lijiang Li
...
Jinlong Peng
Haoyu Cao
Ke Li
Rongrong Ji
Xing Sun
78
2
0
06 May 2025
BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech Recognition
BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech Recognition
Paige Tuttosi
Mantaj Dhillon
Luna Sang
Shane Eastwood
Poorvi Bhatia
Quang Minh Dinh
Avni Kapoor
Yewon Jin
Angelica Lim
75
2
0
30 Apr 2025
Kimi-Audio Technical Report
Kimi-Audio Technical Report
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
Shixuan Liu
...
Zhiyong Yang
Aoxiong Yin
Ruibin Yuan
Yanzhe Zhang
Zaida Zhou
AuLLMVLM
183
13
0
25 Apr 2025
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
Prabhat Pandey
Rupak Vignesh Swaminathan
K V Vijay Girish
Arunasish Sen
Jian Xie
Grant P. Strimel
Andreas Schwarz
455
2
0
12 Apr 2025
Scaling Analysis of Interleaved Speech-Text Language Models
Scaling Analysis of Interleaved Speech-Text Language Models
Gallil Maimon
Michael Hassid
Amit Roth
Yossi Adi
AuLLM
121
1
0
03 Apr 2025
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Yangyang Meng
Jinpeng Li
Guodong Lin
Yu Pu
G. Wang
Hu Du
Zhiming Shao
Yukai Huang
Ke Li
Wei-Qiang Zhang
ObjD
148
0
0
26 Mar 2025
From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM
Kshitij Ambilduke
Ben Peters
Sonal Sannigrahi
Anil Keshwani
Tsz Kin Lam
Bruno Martins
Marcely Zanon Boito
André F. T. Martins
110
2
0
13 Mar 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
106
0
0
11 Mar 2025
Text-Speech Language Models with Improved Cross-Modal Transfer by Aligning Abstraction Levels
Santiago Cuervo
Adel Moumen
Yanis Labrak
Sameer Khurana
Antoine Laurent
Mickael Rouvier
R. Marxer
136
1
0
08 Mar 2025
Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning
Lucas Block Medin
Thomas Pellegrini
Lucile Gelin
SSL
83
2
0
06 Mar 2025
Exploiting Vulnerabilities in Speech Translation Systems through Targeted Adversarial Attacks
Chang-rui Liu
Haolin Wu
Xi Yang
Kui Zhang
Cong Wu
Weinan Zhang
Nenghai Yu
Tianwei Zhang
Qing Guo
Jie Zhang
AAML
62
0
0
02 Mar 2025
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation
Qiuming Zhao
Guangzhi Sun
Chao Zhang
Mingxing Xu
Thomas Fang Zheng
MoMeVLM
455
1
0
24 Feb 2025
Speech to Speech Translation with Translatotron: A State of the Art Review
Speech to Speech Translation with Translatotron: A State of the Art Review
Jules R. Kala
Emmanuel Adetiba
Abdultaofeek Abayom
Oluwatobi E. Dare
Ayodele H. Ifijeh
289
0
0
21 Feb 2025
Slamming: Training a Speech Language Model on One GPU in a Day
Slamming: Training a Speech Language Model on One GPU in a Day
Gallil Maimon
Avishai Elmakies
Yossi Adi
95
3
0
19 Feb 2025
On the Robust Approximation of ASR Metrics
Abdul Waheed
Hanin Atwany
Rita Singh
Bhiksha Raj
25
0
0
18 Feb 2025
Evaluation of Deep Audio Representations for Hearables
Evaluation of Deep Audio Representations for Hearables
Fabian Gröger
Pascal Baumann
Ludovic Amruthalingam
Laurent Simon
Ruksana Giurda
Simone Lionetti
123
0
0
10 Feb 2025
XAttnMark: Learning Robust Audio Watermarking with Cross-Attention
XAttnMark: Learning Robust Audio Watermarking with Cross-Attention
Yang Liu
Lie Lu
Jihui Jin
Lichao Sun
Andrea Fanelli
176
1
0
06 Feb 2025
Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond
Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond
Mardhiyah Sanni
Tassallah Abdullahi
Devendra D. Kayande
Emmanuel Ayodele
Naome A. Etori
...
Chibuzor Okocha
L. Ismaila
Folafunmi Omofoye
Boluwatife A. Adewale
Tobi Olatunji
166
1
0
06 Feb 2025
High-Fidelity Simultaneous Speech-To-Speech Translation
High-Fidelity Simultaneous Speech-To-Speech Translation
Tom Labiausse
Laurent Mazaré
Edouard Grave
P. Pérez
Alexandre Défossez
Neil Zeghidour
488
1
0
05 Feb 2025
When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation
When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation
Anna Min
Chenxu Hu
Yi Ren
Hang Zhao
107
1
0
01 Feb 2025
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation
Anna Min
Chenxu Hu
Yi Ren
Hang Zhao
96
0
0
01 Feb 2025
Language Bias in Self-Supervised Learning For Automatic Speech Recognition
Language Bias in Self-Supervised Learning For Automatic Speech Recognition
Edward Storey
Naomi Harte
Peter Bell
96
0
0
31 Jan 2025
A Survey on Spoken Italian Datasets and Corpora
A Survey on Spoken Italian Datasets and Corpora
Marco Giordano
Claudia Rinaldi
105
0
0
11 Jan 2025
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition
Shucong Zhang
Titouan Parcollet
Rogier van Dalen
Sourav Bhattacharya
94
0
0
10 Jan 2025
DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition
Alexander Polok
Dominik Klement
M. Kocour
Jiangyu Han
Federico Landini
Bolaji Yusuf
Sanjeev Khudanpur
Kevin Duh
J. Černocký
L. Burget
67
0
0
03 Jan 2025
CA-SSLR: Condition-Aware Self-Supervised Learning Representation for
  Generalized Speech Processing
CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing
Yen-Ju Lu
Jing Liu
Thomas Thebaud
Laureano Moro-Velazquez
Ariya Rastrow
Najim Dehak
Jesus Villalba
135
1
0
05 Dec 2024
MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
Thai-Binh Nguyen
Alexander Waibel
140
3
0
27 Nov 2024
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Dennis Fucci
Marco Gaido
Beatrice Savoldi
Matteo Negri
Mauro Cettolo
L. Bentivogli
270
3
0
03 Nov 2024
An Empirical Analysis of Speech Self-Supervised Learning at Multiple
  Resolutions
An Empirical Analysis of Speech Self-Supervised Learning at Multiple Resolutions
Theo Clark
Benedetta Cevoli
Eloy de Jong
Timofey Abramski
Jamie Dougherty
SSL
71
0
0
31 Oct 2024
1234567
Next