ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.00390
  4. Cited By
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
  Learning, Semi-Supervised Learning and Interpretation
v1v2 (latest)

VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

2 January 2021
Changhan Wang
M. Rivière
Ann Lee
Anne Wu
Chaitanya Talnikar
Daniel Haziza
Mary Williamson
J. Pino
Emmanuel Dupoux
    SSL
ArXiv (abs)PDFHTMLGithub (536★)

Papers citing "VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation"

50 / 311 papers shown
Title
FedZKP: Federated Model Ownership Verification with Zero-knowledge Proof
FedZKP: Federated Model Ownership Verification with Zero-knowledge Proof
Wenyuan Yang
Yuguo Yin
Gongxi Zhu
Hanlin Gu
Lixin Fan
Xiaochun Cao
Qiang Yang
FedML
78
9
0
08 May 2023
Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR
  with Internal Language Model Estimation
Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model Estimation
Nilaksh Das
Monica Sunkara
S. Bodapati
Jason (Jinglun) Cai
Devang Kulshreshtha
Jeffrey J. Farris
Katrin Kirchhoff
57
3
0
05 May 2023
Considerations for Ethical Speech Recognition Datasets
Considerations for Ethical Speech Recognition Datasets
Avijoy Chakma
Zahid Hasan
35
4
0
03 May 2023
NAIST-SIC-Aligned: an Aligned English-Japanese Simultaneous
  Interpretation Corpus
NAIST-SIC-Aligned: an Aligned English-Japanese Simultaneous Interpretation Corpus
Jinming Zhao
Yuka Ko
Kosuke Doi
Ryo Fukuda
Katsuhito Sudoh
Satoshi Nakamura
152
2
0
23 Apr 2023
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming
  Conformer ASR
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR
Xilai Li
Goeric Huybrechts
S. Ronanki
Jeffrey J. Farris
S. Bodapati
68
7
0
18 Apr 2023
Political corpus creation through automatic speech recognition on EU
  debates
Political corpus creation through automatic speech recognition on EU debates
Hugo De Vos
Suzan Verberne
57
2
0
17 Apr 2023
Efficient Sequence Transduction by Jointly Predicting Tokens and
  Durations
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
Hainan Xu
Fei Jia
Somshubra Majumdar
Hengguan Huang
Shinji Watanabe
Boris Ginsburg
70
26
0
13 Apr 2023
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup
  for Visual Speech Translation and Recognition
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
Xize Cheng
Lin Li
Tao Jin
Rongjie Huang
Wang Lin
Zehan Wang
Huangdai Liu
Yejin Wang
Aoxiong Yin
Zhou Zhao
84
25
0
09 Mar 2023
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec
  Language Modeling
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Zi-Hua Zhang
Long Zhou
Chengyi Wang
Sanyuan Chen
Yu Wu
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
VLM
98
187
0
07 Mar 2023
TS-SEP: Joint Diarization and Separation Conditioned on Estimated
  Speaker Embeddings
TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings
Christoph Boeddeker
Aswin Shanmugam Subramanian
Gordon Wichern
Reinhold Haeb-Umbach
Jonathan Le Roux
93
24
0
07 Mar 2023
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised
  Speech and Text Representations
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Ankur Bapna
M. Bacchiani
94
29
0
03 Mar 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
179
270
0
02 Mar 2023
Synthetic Cross-accent Data Augmentation for Automatic Speech
  Recognition
Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition
P. Klumpp
Pooja Chitkara
Leda Sari
Prashant Serai
Jilong Wu
Irina-Elena Veliche
Rongqing Huang
Qing He
58
4
0
01 Mar 2023
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Max Bain
Jaesung Huh
Tengda Han
Andrew Zisserman
143
242
0
01 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised
  representations
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations
N. Shah
Saiteja Kosgi
Vishal Tambrahalli
Neha Sahipjohn
Anil Nelakanti
Vineet Gandhi
74
8
0
01 Mar 2023
Improving Massively Multilingual ASR With Auxiliary CTC Objectives
Improving Massively Multilingual ASR With Auxiliary CTC Objectives
William Chen
Brian Yan
Jiatong Shi
Yifan Peng
Soumi Maiti
Shinji Watanabe
91
40
0
24 Feb 2023
Factual Consistency Oriented Speech Recognition
Factual Consistency Oriented Speech Recognition
Naoyuki Kanda
Takuya Yoshioka
Yang Liu
68
0
0
24 Feb 2023
ASR Bundestag: A Large-Scale political debate dataset in German
ASR Bundestag: A Large-Scale political debate dataset in German
Johannes Wirth
René Peinl
62
1
0
12 Feb 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with
  Unsupervised Text Pretraining
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Takaaki Saeki
Soumi Maiti
Xinjian Li
Shinji Watanabe
Shinnosuke Takamichi
Hiroshi Saruwatari
107
18
0
30 Jan 2023
A Holistic Cascade System, benchmark, and Human Evaluation Protocol for
  Expressive Speech-to-Speech Translation
A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation
Wen-Chin Huang
Benjamin Peloquin
Justine T. Kao
Changhan Wang
Hongyu Gong
Elizabeth Salesky
Yossi Adi
Ann Lee
Peng-Jen Chen
81
16
0
25 Jan 2023
Scaling Laws for Generative Mixed-Modal Language Models
Scaling Laws for Generative Mixed-Modal Language Models
Armen Aghajanyan
L. Yu
Alexis Conneau
Wei-Ning Hsu
Karen Hambardzumyan
Susan Zhang
Stephen Roller
Naman Goyal
Omer Levy
Luke Zettlemoyer
MoEVLM
100
110
0
10 Jan 2023
Multi-modal deep learning system for depression and anxiety detection
Multi-modal deep learning system for depression and anxiety detection
Brian Diep
Marija Stanojevic
Jekaterina Novikova
61
7
0
30 Dec 2022
Pushing the performances of ASR models on English and Spanish accents
Pushing the performances of ASR models on English and Spanish accents
Pooja Chitkara
M. Rivière
Jade Copet
Frank Zhang
Yatharth Saraf
46
0
0
22 Dec 2022
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for
  Universal and Generalized Speech Enhancement
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Wei-Ning Hsu
Tal Remez
Bowen Shi
Jacob Donley
Yossi Adi
DiffM
93
12
0
21 Dec 2022
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
  Tasks
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks
Suwon Shon
Siddhant Arora
Chyi-Jiunn Lin
Ankita Pasad
Felix Wu
Roshan S. Sharma
Wei Wu
Hung-yi Lee
Karen Livescu
Shinji Watanabe
ELM
80
33
0
20 Dec 2022
Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models
Mu2^{2}2SLAM: Multitask, Multilingual Speech and Language Models
Yong Cheng
Yu Zhang
Melvin Johnson
Wolfgang Macherey
Ankur Bapna
66
8
0
19 Dec 2022
A Review of Speech-centric Trustworthy Machine Learning: Privacy,
  Safety, and Fairness
A Review of Speech-centric Trustworthy Machine Learning: Privacy, Safety, and Fairness
Tiantian Feng
Rajat Hebbar
Nicholas Mehlman
Xuan Shi
Aditya Kommineni
and Shrikanth Narayanan
108
34
0
18 Dec 2022
Context-aware Fine-tuning of Self-supervised Speech Models
Context-aware Fine-tuning of Self-supervised Speech Models
Suwon Shon
Felix Wu
Kwangyoun Kim
Prashant Sridhar
Karen Livescu
Shinji Watanabe
74
9
0
16 Dec 2022
BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
Mingda Chen
Paul-Ambroise Duquenne
Pierre Yves Andrews
Justine T. Kao
Alexandre Mourachko
Holger Schwenk
Marta R. Costa-jussá
65
18
0
16 Dec 2022
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
Hirofumi Inaguma
Sravya Popuri
Ilia Kulikov
Peng-Jen Chen
Changhan Wang
Yu-An Chung
Yun Tang
Ann Lee
Shinji Watanabe
J. Pino
119
61
0
15 Dec 2022
Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
230
3,770
0
06 Dec 2022
EURO: ESPnet Unsupervised ASR Open-source Toolkit
EURO: ESPnet Unsupervised ASR Open-source Toolkit
Dongji Gao
Jiatong Shi
Shun-Po Chuang
Leibny Paola García-Perera
Hung-yi Lee
Shinji Watanabe
Sanjeev Khudanpur
113
8
0
30 Nov 2022
Learning General Audio Representations with Large-Scale Training of
  Patchout Audio Transformers
Learning General Audio Representations with Large-Scale Training of Patchout Audio Transformers
Khaled Koutini
Shahed Masoudian
Florian Schmid
Hamid Eghbalzadeh
Jan Schluter
Gerhard Widmer
131
6
0
25 Nov 2022
Dialogs Re-enacted Across Languages
Dialogs Re-enacted Across Languages
Nigel G. Ward
Jonathan Avila
Emilia Rivas
Divette Marco
64
2
0
18 Nov 2022
Multi-Speaker and Wide-Band Simulated Conversations as Training Data for
  End-to-End Neural Diarization
Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization
Federico Landini
Mireia Díez
Alicia Lozano-Diez
L. Burget
65
15
0
12 Nov 2022
Speech-to-Speech Translation For A Real-world Unwritten Language
Speech-to-Speech Translation For A Real-world Unwritten Language
Peng-Jen Chen
Ke M. Tran
Yilin Yang
Jingfei Du
Justine T. Kao
...
Sravya Popuri
Changhan Wang
J. Pino
Wei-Ning Hsu
Ann Lee
91
26
0
11 Nov 2022
A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models
  for Spoken Language Understanding
A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding
Yifan Peng
Siddhant Arora
Yosuke Higuchi
Yushi Ueda
Sujay S. Kumar
Karthik Ganesan
Siddharth Dalmia
Xuankai Chang
Shinji Watanabe
75
21
0
10 Nov 2022
Accidental Learners: Spoken Language Identification in Multilingual
  Self-Supervised Models
Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models
Travis M. Bartley
Fei Jia
Krishna C. Puvvada
Samuel Kriman
Boris Ginsburg
SSL
59
6
0
09 Nov 2022
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual
  Speech-to-Speech Translations
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations
Paul-Ambroise Duquenne
Hongyu Gong
Ning Dong
Jingfei Du
Ann Lee
Vedanuj Goswani
Changhan Wang
J. Pino
Benoît Sagot
Holger Schwenk
102
38
0
08 Nov 2022
Multi-blank Transducers for Speech Recognition
Multi-blank Transducers for Speech Recognition
Hainan Xu
Fei Jia
Somshubra Majumdar
Shinji Watanabe
Boris Ginsburg
87
11
0
04 Nov 2022
Adapting self-supervised models to multi-talker speech recognition using
  speaker embeddings
Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Zili Huang
Desh Raj
Leibny Paola García-Perera
Sanjeev Khudanpur
155
29
0
01 Nov 2022
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to
  Speech Translation
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
Kun Wei
Long Zhou
Zi-Hua Zhang
Liping Chen
Shujie Liu
Lei He
Jinyu Li
Furu Wei
66
13
0
31 Oct 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised
  Learning for Text-To-Speech
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
128
20
0
27 Oct 2022
Improving Speech-to-Speech Translation Through Unlabeled Text
Improving Speech-to-Speech Translation Through Unlabeled Text
Xuan-Phi Nguyen
Sravya Popuri
Changhan Wang
Yun Tang
Ilia Kulikov
Hongyu Gong
66
9
0
26 Oct 2022
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
Sanchit Gandhi
Patrick von Platen
Alexander M. Rush
70
25
0
24 Oct 2022
Optimizing Bilingual Neural Transducer with Synthetic Code-switching
  Text Generation
Optimizing Bilingual Neural Transducer with Synthetic Code-switching Text Generation
Thien Nguyen
Nathalie Tran
Liuhui Deng
Thiago Fraga da Silva
Matthew Radzihovsky
...
Honza Silovsky
Arnab Ghoshal
M. Martel
Bharat Ram Ambati
Mohamed Ali
94
5
0
21 Oct 2022
End-to-End Integration of Speech Recognition, Dereverberation,
  Beamforming, and Self-Supervised Learning Representation
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation
Yoshiki Masuyama
Xuankai Chang
Samuele Cornell
Shinji Watanabe
Nobutaka Ono
79
19
0
19 Oct 2022
Simple and Effective Unsupervised Speech Translation
Simple and Effective Unsupervised Speech Translation
Changhan Wang
Hirofumi Inaguma
Peng-Jen Chen
Ilia Kulikov
Yun Tang
Wei-Ning Hsu
Michael Auli
J. Pino
SSL
97
14
0
18 Oct 2022
Maestro-U: Leveraging joint speech-text representation learning for zero
  supervised speech ASR
Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Zhehuai Chen
Ankur Bapna
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Pedro J. Moreno
Nanxin Chen
105
17
0
18 Oct 2022
Towards Personalization of CTC Speech Recognition Models with Contextual
  Adapters and Adaptive Boosting
Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting
Saket Dingliwal
Monica Sunkara
S. Bodapati
S. Ronanki
Jeffrey J. Farris
Katrin Kirchhoff
69
0
0
18 Oct 2022
Previous
1234567
Next