ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.04624
  4. Cited By
SpeechBrain: A General-Purpose Speech Toolkit

SpeechBrain: A General-Purpose Speech Toolkit

8 June 2021
Mirco Ravanelli
Titouan Parcollet
Peter William VanHarn Plantinga
Aku Rouhe
Samuele Cornell
Loren Lugosch
Cem Subakan
Nauman Dawalatabad
A. Heba
Jianyuan Zhong
Ju-Chieh Chou
Sung-Lin Yeh
Szu-Wei Fu
Chien-Feng Liao
E. Rastorgueva
Franccois Grondin
William Aris
Hwidong Na
Yan Gao
R. Mori
Yoshua Bengio
ArXivPDFHTML

Papers citing "SpeechBrain: A General-Purpose Speech Toolkit"

50 / 152 papers shown
Title
I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra
  Contrastive Regularization
I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization
Dianwen Ng
J. Yip
Tanmay Surana
Zhao Yang
Chong Zhang
Yukun Ma
Chongjia Ni
Chng Eng Siong
B. Ma
35
6
0
14 Sep 2022
DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion
DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion
Ruibin Yuan
Yuxuan Wu
Jacob Li
Jaxter Kim
26
5
0
09 Sep 2022
Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription
Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription
Longshen Ou
Xiangming Gu
Ye Wang
30
21
0
20 Jul 2022
Two-Pass Low Latency End-to-End Spoken Language Understanding
Two-Pass Low Latency End-to-End Spoken Language Understanding
Siddhant Arora
Siddharth Dalmia
Xuankai Chang
Brian Yan
A. Black
Shinji Watanabe
VLM
30
19
0
14 Jul 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and
  Global Context for Speech Recognition and Understanding
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
30
143
0
06 Jul 2022
BERT, can HE predict contrastive focus? Predicting and controlling
  prominence in neural TTS using a language model
BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model
Brooke Stephenson
Laurent Besacier
Laurent Girin
Thomas Hueber
12
8
0
04 Jul 2022
Speaker Verification in Multi-Speaker Environments Using Temporal
  Feature Fusion
Speaker Verification in Multi-Speaker Environments Using Temporal Feature Fusion
Ahmad Aloradi
Wolfgang Mack
Mohamed Elminshawi
Emanuel Habets
32
5
0
28 Jun 2022
Pruned RNN-T for fast, memory-efficient ASR training
Pruned RNN-T for fast, memory-efficient ASR training
Fangjun Kuang
Liyong Guo
Wei Kang
Long Lin
Mingshuang Luo
Zengwei Yao
Daniel Povey
27
64
0
23 Jun 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
36
79
0
16 Jun 2022
Investigation of Ensemble features of Self-Supervised Pretrained Models
  for Automatic Speech Recognition
Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition
Anjana Arunkumar
Vrunda N. Sukhadia
S. Umesh
27
10
0
11 Jun 2022
Svadhyaya system for the Second Diagnosing COVID-19 using Acoustics
  Challenge 2021
Svadhyaya system for the Second Diagnosing COVID-19 using Acoustics Challenge 2021
Deepak Mittal
A. H. Poorjam
Debottam Dutta
Debarpan Bhattacharya
Zemin Yu
Sriram Ganapathy
M. Singh
18
0
0
11 Jun 2022
AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker
  Recognition Systems
AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker Recognition Systems
Guangke Chen
Zhe Zhao
Fu Song
Sen Chen
Lingling Fan
Yang Liu
AAML
32
18
0
07 Jun 2022
FlexLip: A Controllable Text-to-Lip System
FlexLip: A Controllable Text-to-Lip System
Dan Oneaţă
Beáta Lőrincz
Adriana Stan
H. Cucu
26
3
0
07 Jun 2022
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Hui Zhang
Tian Yuan
Junkun Chen
Xintong Li
Renjie Zheng
...
Zeyu Chen
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
Liang Huang
AuLLM
31
24
0
20 May 2022
Learning Representations for New Sound Classes With Continual
  Self-Supervised Learning
Learning Representations for New Sound Classes With Continual Self-Supervised Learning
Zhepei Wang
Cem Subakan
Xilin Jiang
Junkai Wu
Efthymios Tzinis
Mirco Ravanelli
Paris Smaragdis
CLL
SSL
67
19
0
15 May 2022
Baselines and Protocols for Household Speaker Recognition
Baselines and Protocols for Household Speaker Recognition
A. Sholokhov
Xuechen Liu
Md. Sahidullah
Tomi Kinnunen
25
4
0
30 Apr 2022
Automated speech tools for helping communities process restricted-access
  corpora for language revival efforts
Automated speech tools for helping communities process restricted-access corpora for language revival efforts
Nay San
Martijn Bartelds
Tolúldopdé Ogúnrdemí
Alison Mount
R. Thompson
Mike Higgins
Roy Barker
Jane Simpson
Dan Jurafsky
30
6
0
15 Apr 2022
Receptive Field Analysis of Temporal Convolutional Networks for Monaural
  Speech Dereverberation
Receptive Field Analysis of Temporal Convolutional Networks for Monaural Speech Dereverberation
William Ravenscroft
Stefan Goetze
Thomas Hain
11
8
0
13 Apr 2022
ASR in German: A Detailed Error Analysis
ASR in German: A Detailed Error Analysis
John M. Wirth
René Peinl
20
5
0
12 Apr 2022
Auditory-Based Data Augmentation for End-to-End Automatic Speech
  Recognition
Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition
Zehai Tu
Jack Deadman
Ning Ma
Jon Barker
29
4
0
08 Apr 2022
Federated Self-supervised Speech Representations: Are We There Yet?
Federated Self-supervised Speech Representations: Are We There Yet?
Yan Gao
Javier Fernandez-Marques
Titouan Parcollet
Abhinav Mehrotra
Nicholas D. Lane
35
13
0
06 Apr 2022
Introducing ECAPA-TDNN and Wav2Vec2.0 Embeddings to Stuttering Detection
Introducing ECAPA-TDNN and Wav2Vec2.0 Embeddings to Stuttering Detection
S. A. Sheikh
Md. Sahidullah
F. Hirsch
Slim Ouni
19
17
0
04 Apr 2022
End-to-end model for named entity recognition from speech without paired
  training data
End-to-end model for named entity recognition from speech without paired training data
Salima Mdhaffar
J. Duret
Titouan Parcollet
Yannick Esteve
14
13
0
02 Apr 2022
Improving Mispronunciation Detection with Wav2vec2-based Momentum
  Pseudo-Labeling for Accentedness and Intelligibility Assessment
Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment
Mu Yang
K. Hirschi
S. Looney
Okim Kang
John H. L. Hansen
40
15
0
29 Mar 2022
Integrating Lattice-Free MMI into End-to-End Speech Recognition
Integrating Lattice-Free MMI into End-to-End Speech Recognition
Jinchuan Tian
Jianwei Yu
Chao Weng
Yuexian Zou
Dong Yu
29
8
0
29 Mar 2022
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
Binbin Zhang
Di Wu
Zhendong Peng
Xingcheng Song
Zhuoyuan Yao
Hang Lv
Linfu Xie
Chao Yang
Fuping Pan
Jianwei Niu
VLM
26
93
0
29 Mar 2022
Visualizations of Complex Sequences of Family-Infant Vocalizations Using
  Bag-of-Audio-Words Approach Based on Wav2vec 2.0 Features
Visualizations of Complex Sequences of Family-Infant Vocalizations Using Bag-of-Audio-Words Approach Based on Wav2vec 2.0 Features
Jialu Li
M. Hasegawa-Johnson
Nancy L. McElwain
18
0
0
29 Mar 2022
Listen, Adapt, Better WER: Source-free Single-utterance Test-time
  Adaptation for Automatic Speech Recognition
Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition
Guan-Ting Lin
Shang-Wen Li
Hung-yi Lee
TTA
VLM
15
9
0
27 Mar 2022
Pushing the limits of raw waveform speaker recognition
Pushing the limits of raw waveform speaker recognition
Jee-weon Jung
You Jin Kim
Hee-Soo Heo
Bong-Jin Lee
Youngki Kwon
Joon Son Chung
31
87
0
16 Mar 2022
Magnitude-aware Probabilistic Speaker Embeddings
Magnitude-aware Probabilistic Speaker Embeddings
Nikita Kuzmin
Igor Fedorov
A. Sholokhov
27
7
0
28 Feb 2022
BEA-Base: A Benchmark for ASR of Spontaneous Hungarian
BEA-Base: A Benchmark for ASR of Spontaneous Hungarian
P. Mihajlik
A. Balog
T. E. Gráczi
A. Kohári
Balázs Tarján
K. Mády
25
8
0
01 Feb 2022
Improving Mandarin End-to-End Speech Recognition with Word N-gram
  Language Model
Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model
Jinchuan Tian
Jianwei Yu
Chao Weng
Yuexian Zou
Dong Yu
23
10
0
06 Jan 2022
Perceptual Loss with Recognition Model for Single-Channel Enhancement
  and Robust ASR
Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR
Peter William VanHarn Plantinga
Deblin Bagchi
Eric Fosler-Lussier
46
10
0
11 Dec 2021
Are E2E ASR models ready for an industrial usage?
Are E2E ASR models ready for an industrial usage?
Valentin Vielzeuf
G. Antipov
26
8
0
09 Dec 2021
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
Siddhant Arora
Siddharth Dalmia
Pavel Denisov
Xuankai Chang
Yushi Ueda
...
Karthik Ganesan
Brian Yan
Ngoc Thang Vu
A. Black
Shinji Watanabe
VLM
33
74
0
29 Nov 2021
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at
  Scale
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Arun Babu
Changhan Wang
Andros Tjandra
Kushal Lakhotia
Qiantong Xu
...
Yatharth Saraf
J. Pino
Alexei Baevski
Alexis Conneau
Michael Auli
SSL
32
657
0
17 Nov 2021
Effective Cross-Utterance Language Modeling for Conversational Speech
  Recognition
Effective Cross-Utterance Language Modeling for Conversational Speech Recognition
Bi-Cheng Yan
Hsin-Wei Wang
Shih-Hsuan Chiu
Hsuan-Sheng Chiu
Berlin Chen
21
1
0
05 Nov 2021
TorchAudio: Building Blocks for Audio and Speech Processing
TorchAudio: Building Blocks for Audio and Speech Processing
Yao-Yuan Yang
Moto Hira
Zhaoheng Ni
Anjali Chourdia
Artyom Astafurov
...
Sean Narenthiran
Shinji Watanabe
Soumith Chintala
Vincent Quenneville-Bélair
Yangyang Shi
31
165
0
28 Oct 2021
MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only
  on noisy/ reverberated speech
MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech
Szu-Wei Fu
Cheng Yu
Kuo-Hsuan Hung
Mirco Ravanelli
Yu Tsao
38
46
0
12 Oct 2021
Fine-tuning wav2vec2 for speaker recognition
Fine-tuning wav2vec2 for speaker recognition
Nik Vaessen
David A. van Leeuwen
42
107
0
30 Sep 2021
Soundata: A Python library for reproducible use of audio datasets
Soundata: A Python library for reproducible use of audio datasets
Magdalena Fuentes
Justin Salamon
Pablo Zinemanas
Martín Rocamora
Genís Paja
Irán R. Román
M. Miron
Xavier Serra
J. P. Bello
11
3
0
26 Sep 2021
XMUSPEECH System for VoxCeleb Speaker Recognition Challenge 2021
XMUSPEECH System for VoxCeleb Speaker Recognition Challenge 2021
Jie Wang
Fuchuan Tong
Zhi-Cong Chen
Lin Li
Q. Hong
Haodong Zhou
34
1
0
06 Sep 2021
The SpeakIn System for VoxCeleb Speaker Recognition Challange 2021
The SpeakIn System for VoxCeleb Speaker Recognition Challange 2021
Miao Zhao
Yufeng Ma
Min Liu
Minqiang Xu
33
59
0
05 Sep 2021
Layer-wise Analysis of a Self-supervised Speech Representation Model
Layer-wise Analysis of a Self-supervised Speech Representation Model
Ankita Pasad
Ju-Chieh Chou
Karen Livescu
SSL
26
288
0
10 Jul 2021
End-to-End Speech Recognition from Federated Acoustic Models
End-to-End Speech Recognition from Federated Acoustic Models
Yan Gao
Titouan Parcollet
Salah Zaiem
Javier Fernandez-Marques
Pedro Porto Buarque de Gusmão
Daniel J. Beutel
Nicholas D. Lane
28
43
0
29 Apr 2021
Conditional independence for pretext task selection in Self-supervised
  speech representation learning
Conditional independence for pretext task selection in Self-supervised speech representation learning
Salah Zaiem
Titouan Parcollet
S. Essid
SSL
6
4
0
15 Apr 2021
Timers and Such: A Practical Benchmark for Spoken Language Understanding
  with Numbers
Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers
Loren Lugosch
Piyush Papreja
Mirco Ravanelli
A. Heba
Titouan Parcollet
24
12
0
04 Apr 2021
Bayesian HMM clustering of x-vector sequences (VBx) in speaker
  diarization: theory, implementation and analysis on standard tasks
Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks
Federico Landini
Jan Profant
Mireia Díez
L. Burget
216
199
0
29 Dec 2020
Can Federated Learning Save The Planet?
Can Federated Learning Save The Planet?
Xinchi Qiu
Titouan Parcollet
Daniel J. Beutel
Taner Topal
Akhil Mathur
Nicholas D. Lane
23
78
0
13 Oct 2020
pyannote.audio: neural building blocks for speaker diarization
pyannote.audio: neural building blocks for speaker diarization
H. Bredin
Ruiqing Yin
Juan Manuel Coria
G. Gelly
Pavel Korshunov
Marvin Lavechin
D. Fustes
Hadrien Titeux
Wassim Bouaziz
Marie-Philippe Gill
191
313
0
04 Nov 2019
Previous
1234
Next