Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.06909
Cited By
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
13 June 2021
Guoguo Chen
Shuzhou Chai
Guan-Bo Wang
Jiayu Du
Weiqiang Zhang
Chao Weng
Dan Su
Daniel Povey
J. Trmal
Junbo Zhang
Mingjie Jin
Sanjeev Khudanpur
Shinji Watanabe
Shuaijiang Zhao
Wei Zou
Xiangang Li
Xuchen Yao
Yongqing Wang
Yujun Wang
Zhao You
Zhiyong Yan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio"
50 / 257 papers shown
Title
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Zi-Hua Zhang
Long Zhou
Chengyi Wang
Sanyuan Chen
Yu Wu
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
VLM
30
170
0
07 Mar 2023
TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings
Christoph Boeddeker
Aswin Shanmugam Subramanian
G. Wichern
Reinhold Haeb-Umbach
Jonathan Le Roux
29
23
0
07 Mar 2023
Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition
Xie Chen
Ziyang Ma
Changli Tang
Yujin Wang
Zhi-shen Zheng
8
4
0
18 Feb 2023
ASR Bundestag: A Large-Scale political debate dataset in German
Johannes Wirth
René Peinl
23
1
0
12 Feb 2023
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech
Li-Wei Chen
Shinji Watanabe
Alexander I. Rudnicky
16
35
0
08 Feb 2023
Efficient Domain Adaptation for Speech Foundation Models
Bo-wen Li
DongSeon Hwang
Zhouyuan Huo
Junwen Bai
Guru Prakash
...
K. Sim
Yu Zhang
Wei Han
Trevor Strohman
F. Beaufays
AI4CE
44
23
0
03 Feb 2023
Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope
Yuran Zhang
Jiajie Zou
Nai Ding
9
6
0
14 Jan 2023
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
49
3,290
0
06 Dec 2022
Label-free Knowledge Distillation with Contrastive Loss for Light-weight Speaker Recognition
Zhiyuan Peng
Xuanji He
Ke Ding
Tan Lee
Guanglu Wan
12
6
0
06 Dec 2022
EURO: ESPnet Unsupervised ASR Open-source Toolkit
Dongji Gao
Jiatong Shi
Shun-Po Chuang
Leibny Paola García-Perera
Hung-yi Lee
Shinji Watanabe
Sanjeev Khudanpur
24
8
0
30 Nov 2022
SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale
Raphael Tang
K. Kumar
Gefei Yang
Akshat Pandey
Yajie Mao
Vladislav Belyaev
Madhuri Emmadi
Craig Murray
Ferhan Ture
Jimmy J. Lin
19
4
0
21 Nov 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
30
37
0
21 Nov 2022
LongFNT: Long-form Speech Recognition with Factorized Neural Transducer
Xun Gong
Yu-Huan Wu
Jinyu Li
Shujie Liu
Rui Zhao
Xie Chen
Y. Qian
RALM
26
10
0
17 Nov 2022
On Unsupervised Uncertainty-Driven Speech Pseudo-Label Filtering and Model Calibration
Nauman Dawalatabad
Sameer Khurana
Antoine Laurent
James R. Glass
16
3
0
14 Nov 2022
Towards A Unified Conformer Structure: from ASR to ASV Task
Dexin Liao
Tao Jiang
Feng Wang
Lin Li
Q. Hong
27
10
0
14 Nov 2022
Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization
Federico Landini
Mireia Díez
Alicia Lozano-Diez
L. Burget
37
15
0
12 Nov 2022
A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding
Yifan Peng
Siddhant Arora
Yosuke Higuchi
Yushi Ueda
Sujay S. Kumar
Karthik Ganesan
Siddharth Dalmia
Xuankai Chang
Shinji Watanabe
19
20
0
10 Nov 2022
Monolingual Recognizers Fusion for Code-switching Speech Recognition
Tongtong Song
Qiang Xu
Haoyu Lu
Longbiao Wang
Hao Shi
Yuqin Lin
Yanbing Yang
J. Dang
22
4
0
02 Nov 2022
Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Zili Huang
Desh Raj
Leibny Paola García-Perera
Sanjeev Khudanpur
86
23
0
01 Nov 2022
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Siddhant Arora
Siddharth Dalmia
Brian Yan
Florian Metze
A. Black
Shinji Watanabe
17
12
0
27 Oct 2022
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
23
6
0
26 Oct 2022
UFO2: A unified pre-training framework for online and offline speech recognition
Li Fu
Siqi Li
Qingtao Li
L. Deng
Fangzhu Li
Lu Fan
Meng Chen
Xiaodong He
OffRL
26
8
0
26 Oct 2022
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
Sanchit Gandhi
Patrick von Platen
Alexander M. Rush
30
27
0
24 Oct 2022
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation
Yoshiki Masuyama
Xuankai Chang
Samuele Cornell
Shinji Watanabe
Nobutaka Ono
17
19
0
19 Oct 2022
Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model
Jennifer Drexler Fox
Natalie Delworth
KELM
22
18
0
02 Sep 2022
Two-Pass Low Latency End-to-End Spoken Language Understanding
Siddhant Arora
Siddharth Dalmia
Xuankai Chang
Brian Yan
A. Black
Shinji Watanabe
VLM
27
19
0
14 Jul 2022
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
131
349
0
21 May 2022
Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
Qianqian Dong
Fengpeng Yue
Tom Ko
Mingxuan Wang
Qibing Bai
Yu Zhang
34
16
0
18 May 2022
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Sanyuan Chen
Yu Wu
Chengyi Wang
Shujie Liu
Zhuo Chen
...
Gang Liu
Jinyu Li
Jian Wu
Xiangzhan Yu
Furu Wei
SSL
18
39
0
27 Apr 2022
ASR in German: A Detailed Error Analysis
John M. Wirth
René Peinl
18
5
0
12 Apr 2022
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance
Lin Zhang
Xin Wang
Erica Cooper
Nicholas W. D. Evans
Junichi Yamagishi
19
56
0
11 Apr 2022
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Rong Ye
Chengqi Zhao
Tom Ko
Chutong Meng
Tao Wang
Mingxuan Wang
Jun Cao
9
23
0
08 Apr 2022
Speech Pre-training with Acoustic Piece
Shuo Ren
Shujie Liu
Yu Wu
Long Zhou
Furu Wei
SSL
14
16
0
07 Apr 2022
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
Xuankai Chang
Takashi Maekaku
Yuya Fujita
Shinji Watanabe
VLM
51
45
0
01 Apr 2022
CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition
Chengxin Chen
Pengyuan Zhang
AI4TS
16
10
0
31 Mar 2022
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset
Zehui Yang
Yifan Chen
Lei Luo
Runyan Yang
Lingxuan Ye
...
Yaohui Jin
Qingqing Zhang
Pengyuan Zhang
Lei Xie
Yonghong Yan
15
47
0
31 Mar 2022
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
Binbin Zhang
Di Wu
Zhendong Peng
Xingcheng Song
Zhuoyuan Yao
Hang Lv
Linfu Xie
Chao Yang
Fuping Pan
Jianwei Niu
VLM
26
93
0
29 Mar 2022
Filler Word Detection and Classification: A Dataset and Benchmark
Ge Zhu
Juan-Pablo Caceres
Justin Salamon
13
8
0
28 Mar 2022
Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models
Vrunda N. Sukhadia
S. Umesh
28
8
0
18 Feb 2022
End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system
Zheng-Wei Zhang
Pan Zhou
39
6
0
18 Feb 2022
Internal Language Model Estimation Through Explicit Context Vector Learning for Attention-based Encoder-decoder ASR
Yufei Liu
Rao Ma
Haihua Xu
Yi He
Zejun Ma
Weibin Zhang
20
12
0
26 Jan 2022
Multi-Variant Consistency based Self-supervised Learning for Robust Automatic Speech Recognition
Changfeng Gao
Gaofeng Cheng
Pengyuan Zhang
25
4
0
23 Dec 2021
JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification
Shinnosuke Takamichi
Ludwig Kurzinger
Takaaki Saeki
Sayaka Shiota
Shinji Watanabe
11
22
0
17 Dec 2021
The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage
Daniel Galvez
G. Diamos
Juan Ciro
Juan Felipe Cerón
Keith Achorn
Anjali Gopi
David Kanter
Maximilian Lam
Mark Mazumder
Vijay Janapa Reddi
20
95
0
17 Nov 2021
Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity
Peter Wu
Jiatong Shi
Yifan Zhong
Shinji Watanabe
A. Black
21
8
0
02 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
101
1,704
0
26 Oct 2021
Lhotse: a speech data representation library for the modern deep learning ecosystem
Willem Hagemann
Daniel Povey
Jan "Yenda" Trmal
Sanjeev Khudanpur
AuLLM
AI4TS
30
31
0
25 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
235
1,024
0
13 Oct 2021
UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training
Sanyuan Chen
Yu Wu
Chengyi Wang
Zhengyang Chen
Zhuo Chen
...
Jian Wu
Yao Qian
Furu Wei
Jinyu Li
Xiangzhan Yu
SSL
27
85
0
12 Oct 2021
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition
Xuankai Chang
Takashi Maekaku
Pengcheng Guo
Jing Shi
Yen-Ju Lu
...
Tianzi Wang
Shu-Wen Yang
Yu Tsao
Hung-yi Lee
Shinji Watanabe
SSL
AI4TS
18
81
0
09 Oct 2021
Previous
1
2
3
4
5
6
Next