Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.08779
Cited By
v1
v2
v3 (latest)
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"
50 / 1,048 papers shown
Title
Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring
Hirofumi Inaguma
Yosuke Higuchi
Kevin Duh
Tatsuya Kawahara
Shinji Watanabe
91
11
0
09 Sep 2021
The IDLAB VoxCeleb Speaker Recognition Challenge 2021 System Description
Jenthe Thienpondt
Brecht Desplanques
Kris Demuynck
90
26
0
09 Sep 2021
A Survey of Sound Source Localization with Deep Learning Methods
Pierre-Amaury Grumiaux
Srdjan Kitić
Laurent Girin
Alexandre Guérin
80
257
0
08 Sep 2021
Efficient Attention Branch Network with Combined Loss Function for Automatic Speaker Verification Spoof Detection
A. Rostami
M. Homayounpour
A. Nickabadi
AAML
53
10
0
05 Sep 2021
Tree-constrained Pointer Generator for End-to-end Contextual Speech Recognition
Guangzhi Sun
Chao Zhang
P. Woodland
103
33
0
01 Sep 2021
Investigations on Speech Recognition Systems for Low-Resource Dialectal Arabic-English Code-Switching Speech
Injy Hamed
Pavel Denisov
C. Li
Mohamed S. Elmahdy
Slim Abdennadher
Ngoc Thang Vu
68
36
0
29 Aug 2021
Injecting Text in Self-Supervised Speech Pretraining
Zhehuai Chen
Yu Zhang
Andrew Rosenberg
Bhuvana Ramabhadran
Gary Wang
Pedro J. Moreno
SSL
92
36
0
27 Aug 2021
4-bit Quantization of LSTM-based Speech Recognition Models
A. Fasoli
Chia-Yu Chen
Mauricio Serrano
Xiao Sun
Naigang Wang
...
Xiaodong Cui
Brian Kingsbury
Wei Zhang
Zoltán Tüske
K. Gopalakrishnan
MQ
75
23
0
27 Aug 2021
Automatic Speech Recognition And Limited Vocabulary: A Survey
J. L. E. K. Fendji
D. Tala
B. Yenke
M. Atemkeng
108
3
0
23 Aug 2021
NIST SRE CTS Superset: A large-scale dataset for telephony speaker recognition
S. O. Sadjadi
AI4TS
40
24
0
16 Aug 2021
Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity Regularization
Andrew Koh
Fuzhao Xue
Chng Eng Siong
68
20
0
10 Aug 2021
An empirical investigation into audio pipeline approaches for classifying bird species
David Behr
C. Maina
Vukosi Marivate
26
2
0
10 Aug 2021
The HW-TSC's Offline Speech Translation Systems for IWSLT 2021 Evaluation
Minghan Wang
Yuxia Wang
Chang Su
Jiaxin Guo
Yingtao Zhang
...
Shimin Tao
Xingshan Zeng
Liangyou Li
Hao Yang
Ying Qin
54
6
0
09 Aug 2021
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Yu-An Chung
Yu Zhang
Wei Han
Chung-Cheng Chiu
James Qin
Ruoming Pang
Yonghui Wu
SSL
VLM
99
429
0
07 Aug 2021
SpecMix : A Mixed Sample Data Augmentation method for Training withTime-Frequency Domain Features
Gwantae Kim
D. Han
Hanseok Ko
101
45
0
06 Aug 2021
An Encoder-Decoder Based Audio Captioning System With Transfer and Reinforcement Learning
Xinhao Mei
Qiushi Huang
Xubo Liu
Gengyun Chen
Jingqian Wu
...
Tom Ko
H. Tang
Xingkun Shao
Mark D. Plumbley
Wenwu Wang
93
54
0
05 Aug 2021
Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation
Sarala Padi
S. O. Sadjadi
Tianyi Zhou
Ram D. Sriram
81
35
0
05 Aug 2021
A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English
Saida Mussakhojayeva
Yerbolat Khassanov
H. A. Varol
57
17
0
03 Aug 2021
USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments
M. Musaev
Saida Mussakhojayeva
Ilyos Khujayorov
Yerbolat Khassanov
M. Ochilov
H. A. Varol
46
19
0
30 Jul 2021
End-to-End Spectro-Temporal Graph Attention Networks for Speaker Verification Anti-Spoofing and Speech Deepfake Detection
Hemlata Tak
Jee-weon Jung
J. Patino
Madhu R. Kamble
Massimiliano Todisco
Nicholas W. D. Evans
88
175
0
27 Jul 2021
Differentiable Allophone Graphs for Language-Universal Speech Recognition
Brian Yan
Siddharth Dalmia
David R. Mortensen
Florian Metze
Shinji Watanabe
63
11
0
24 Jul 2021
Brazilian Portuguese Speech Recognition Using Wav2vec 2.0
L. Gris
Edresson Casanova
F. S. Oliveira
A. S. Soares
A. Júnior
34
17
0
23 Jul 2021
OLR 2021 Challenge: Datasets, Rules and Baselines
Binling Wang
Wen-Bo Hu
Jing Li
Yiming Zhi
Zheng Li
Q. Hong
Lin Li
Dong Wang
Liming Song
Cheng Yang
55
18
0
23 Jul 2021
Multitask-Based Joint Learning Approach To Robust ASR For Radio Communication Speech
Duo Ma
Nana Hou
Van Tung Pham
Haihua Xu
Chng Eng Siong
69
22
0
22 Jul 2021
Improving Polyphonic Sound Event Detection on Multichannel Recordings with the Sørensen-Dice Coefficient Loss and Transfer Learning
Karn N. Watcharasupat
Thi Ngoc Tho Nguyen
Ngoc Khanh Nguyen
Zhen Jian Lee
Douglas L. Jones
W. Gan
130
0
0
22 Jul 2021
Audio Captioning Transformer
Xinhao Mei
Xubo Liu
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
ViT
94
78
0
21 Jul 2021
A baseline model for computationally inexpensive speech recognition for Kazakh using the Coqui STT framework
Ilnar Salimzianov
38
0
0
19 Jul 2021
Simultaneous Speech Translation for Live Subtitling: from Delay to Display
Alina Karakanta
Sara Papi
Matteo Negri
Marco Turchi
57
10
0
19 Jul 2021
Translatotron 2: High-quality direct speech-to-speech translation with voice preservation
Ye Jia
Michelle Tadmor Ramanovich
Tal Remez
Roi Pomerantz
105
73
0
19 Jul 2021
Between Flexibility and Consistency: Joint Generation of Captions and Subtitles
Alina Karakanta
Marco Gaido
Matteo Negri
Marco Turchi
68
9
0
13 Jul 2021
Conformer-based End-to-end Speech Recognition With Rotary Position Embedding
Shengqiang Li
Menglong Xu
Xiao-Lei Zhang
84
9
0
13 Jul 2021
Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task
Yun Tang
J. Pino
Xian Li
Changhan Wang
Dmitriy Genzel
175
84
0
12 Jul 2021
Direct speech-to-speech translation with discrete units
Ann Lee
Peng-Jen Chen
Changhan Wang
Jiatao Gu
Sravya Popuri
...
Yossi Adi
Qing He
Yun Tang
J. Pino
Wei-Ning Hsu
91
192
0
12 Jul 2021
Visual-Tactile Cross-Modal Data Generation using Residue-Fusion GAN with Feature-Matching and Perceptual Losses
Shaoyu Cai
Kening Zhu
Yuki Ban
Takuji Narumi
53
40
0
12 Jul 2021
Multi-path Convolutional Neural Networks Efficiently Improve Feature Extraction in Continuous Adventitious Lung Sound Detection
Fu-Shun Hsu
Shang-Ran Huang
Chien-Wen Huang
Chun-Chieh Chen
Yuan-Ren Cheng
F. Lai
32
1
0
09 Jul 2021
On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models
Xiaohui Zhang
Vimal Manohar
David C. Zhang
Frank Zhang
Yangyang Shi
Nayan Singhal
Julian Chan
Fuchun Peng
Yatharth Saraf
M. Seltzer
89
14
0
09 Jul 2021
Self-training with noisy student model and semi-supervised loss function for dcase 2021 challenge task 4
Nam Kyun Kim
Hyeongju Kim
68
12
0
06 Jul 2021
The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task
Chen Xu
Xiaoqian Liu
Xiaowen Liu
Laohu Wang
Canan Huang
Tong Xiao
Jingbo Zhu
79
5
0
06 Jul 2021
Oriental Language Recognition (OLR) 2020: Summary and Analysis
Jing Li
Binling Wang
Yiming Zhi
Zheng Li
Lin Li
Q. Hong
Dong Wang
56
11
0
05 Jul 2021
Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation
Ryo Masumura
Daiki Okamura
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Shota Orihashi
57
7
0
04 Jul 2021
A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification
Hao Yen
Chao-Han Huck Yang
Hu Hu
Sabato Marco Siniscalchi
Qing Wang
...
Yuanjun Zhao
Yuzhong Wu
Yannan Wang
Jun Du
Chin-Hui Lee
54
17
0
03 Jul 2021
The HCCL Speaker Verification System for Far-Field Speaker Verification Challenge
Zhuo Li
Ce Fang
Runqiu Xiao
Zhigao Chen
Wenchao Wang
Yonghong Yan
52
2
0
03 Jul 2021
Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition
Timo Lohrenz
P. Schwarz
Zhengyang Li
Tim Fingscheidt
52
11
0
02 Jul 2021
CrowdSpeech and VoxDIY: Benchmark Datasets for Crowdsourced Audio Transcription
Nikita Pavlichenko
Ivan Stelmakh
Dmitry Ustalov
74
19
0
02 Jul 2021
Supervised Contrastive Learning for Accented Speech Recognition
Tao Han
Hantao Huang
Ziang Yang
Wei Han
66
16
0
02 Jul 2021
ESPnet-ST IWSLT 2021 Offline Speech Translation System
Hirofumi Inaguma
Shun Kiyono
Nelson Enrique Yalta Soplin
Pengcheng Guo
Jun Suzuki
Kevin Duh
Shinji Watanabe
3DV
74
2
0
01 Jul 2021
StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR
Hirofumi Inaguma
Tatsuya Kawahara
61
4
0
01 Jul 2021
The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021
Dan Liu
Mengge Du
Xiaoxi Li
Yuchen Hu
Lirong Dai
99
21
0
01 Jul 2021
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
117
576
0
30 Jun 2021
An Integrated Framework for Two-pass Personalized Voice Trigger
Dexin Liao
Jing Li
Yiming Zhi
Song Li
Q. Hong
Lin Li
59
1
0
30 Jun 2021
Previous
1
2
3
...
12
13
14
...
19
20
21
Next