Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.08779
Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"
50 / 747 papers shown
Title
Investigations on Speech Recognition Systems for Low-Resource Dialectal Arabic-English Code-Switching Speech
Injy Hamed
Pavel Denisov
C. Li
Mohamed S. Elmahdy
Slim Abdennadher
Ngoc Thang Vu
38
35
0
29 Aug 2021
Injecting Text in Self-Supervised Speech Pretraining
Zhehuai Chen
Yu Zhang
Andrew Rosenberg
Bhuvana Ramabhadran
Gary Wang
Pedro J. Moreno
SSL
25
36
0
27 Aug 2021
4-bit Quantization of LSTM-based Speech Recognition Models
A. Fasoli
Chia-Yu Chen
Mauricio Serrano
Xiao Sun
Naigang Wang
...
Xiaodong Cui
Brian Kingsbury
Wei Zhang
Zoltán Tüske
K. Gopalakrishnan
MQ
26
21
0
27 Aug 2021
Reducing Exposure Bias in Training Recurrent Neural Network Transducers
Xiaodong Cui
Brian Kingsbury
G. Saon
David Haws
Zoltán Tüske
19
5
0
24 Aug 2021
Automatic Speech Recognition And Limited Vocabulary: A Survey
J. L. E. K. Fendji
D. Tala
B. Yenke
M. Atemkeng
25
3
0
23 Aug 2021
Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity Regularization
Andrew Koh
Fuzhao Xue
Chng Eng Siong
22
20
0
10 Aug 2021
The HW-TSC's Offline Speech Translation Systems for IWSLT 2021 Evaluation
Minghan Wang
Yuxia Wang
Chang Su
Jiaxin Guo
Yingtao Zhang
...
Shimin Tao
Xingshan Zeng
Liangyou Li
Hao Yang
Ying Qin
22
6
0
09 Aug 2021
SpecMix : A Mixed Sample Data Augmentation method for Training withTime-Frequency Domain Features
Gwantae Kim
D. Han
Hanseok Ko
50
42
0
06 Aug 2021
Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation
Sarala Padi
S. O. Sadjadi
Tianyi Zhou
Ram D. Sriram
26
34
0
05 Aug 2021
A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English
Saida Mussakhojayeva
Yerbolat Khassanov
H. A. Varol
22
17
0
03 Aug 2021
End-to-End Spectro-Temporal Graph Attention Networks for Speaker Verification Anti-Spoofing and Speech Deepfake Detection
Hemlata Tak
Jee-weon Jung
J. Patino
Madhu R. Kamble
Massimiliano Todisco
Nicholas W. D. Evans
43
162
0
27 Jul 2021
OLR 2021 Challenge: Datasets, Rules and Baselines
Binling Wang
Wen-Bo Hu
Jing Li
Yiming Zhi
Zheng Li
Q. Hong
Lin Li
Dong Wang
Liming Song
Cheng Yang
26
18
0
23 Jul 2021
Audio Captioning Transformer
Xinhao Mei
Xubo Liu
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
ViT
39
77
0
21 Jul 2021
Simultaneous Speech Translation for Live Subtitling: from Delay to Display
Alina Karakanta
Sara Papi
Matteo Negri
Marco Turchi
28
10
0
19 Jul 2021
Translatotron 2: High-quality direct speech-to-speech translation with voice preservation
Ye Jia
Michelle Tadmor Ramanovich
Tal Remez
Roi Pomerantz
31
68
0
19 Jul 2021
Between Flexibility and Consistency: Joint Generation of Captions and Subtitles
Alina Karakanta
Marco Gaido
Matteo Negri
Marco Turchi
30
9
0
13 Jul 2021
Conformer-based End-to-end Speech Recognition With Rotary Position Embedding
Shengqiang Li
Menglong Xu
Xiao-Lei Zhang
23
9
0
13 Jul 2021
Direct speech-to-speech translation with discrete units
Ann Lee
Peng-Jen Chen
Changhan Wang
Jiatao Gu
Sravya Popuri
...
Yossi Adi
Qing He
Yun Tang
J. Pino
Wei-Ning Hsu
41
181
0
12 Jul 2021
On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models
Xiaohui Zhang
Vimal Manohar
David C. Zhang
Frank Zhang
Yangyang Shi
Nayan Singhal
Julian Chan
Fuchun Peng
Yatharth Saraf
M. Seltzer
28
14
0
09 Jul 2021
The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task
Chen Xu
Xiaoqian Liu
Xiaowen Liu
Laohu Wang
Canan Huang
Tong Xiao
Jingbo Zhu
34
5
0
06 Jul 2021
Oriental Language Recognition (OLR) 2020: Summary and Analysis
Jing Li
Binling Wang
Yiming Zhi
Zheng Li
Lin Li
Q. Hong
Dong Wang
27
10
0
05 Jul 2021
A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification
Hao Yen
Chao-Han Huck Yang
Hu Hu
Sabato Marco Siniscalchi
Qing Wang
...
Yuanjun Zhao
Yuzhong Wu
Yannan Wang
Jun Du
Chin-Hui Lee
19
16
0
03 Jul 2021
Supervised Contrastive Learning for Accented Speech Recognition
Tao Han
Hantao Huang
Ziang Yang
Wei Han
49
15
0
02 Jul 2021
ESPnet-ST IWSLT 2021 Offline Speech Translation System
Hirofumi Inaguma
Shun Kiyono
Nelson Enrique Yalta Soplin
Pengcheng Guo
Jun Suzuki
Kevin Duh
Shinji Watanabe
3DV
40
2
0
01 Jul 2021
The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021
Dan Liu
Mengge Du
Xiaoxi Li
Yuchen Hu
Lirong Dai
32
20
0
01 Jul 2021
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
48
544
0
30 Jun 2021
Robust and Interpretable Temporal Convolution Network for Event Detection in Lung Sound Recordings
Tharindu Fernando
Sridha Sridharan
Simon Denman
H. Ghaemmaghami
Clinton Fookes
38
27
0
30 Jun 2021
DCASE 2021 Task 3: Spectrotemporally-aligned Features for Polyphonic Sound Event Localization and Detection
Thi Ngoc Tho Nguyen
Karn N. Watcharasupat
Ngoc Khanh Nguyen
Douglas L. Jones
W. Gan
27
16
0
29 Jun 2021
SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption
Dara Bahri
Heinrich Jiang
Yi Tay
Donald Metzler
SSL
31
164
0
29 Jun 2021
Dealing with training and test segmentation mismatch: FBK@IWSLT2021
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
44
6
0
23 Jun 2021
Do sound event representations generalize to other audio tasks? A case study in audio transfer learning
Anurag Kumar
Yun Wang
V. Ithapu
Christian Fuegen
24
3
0
21 Jun 2021
Towards sound based testing of COVID-19 -- Summary of the first Diagnostics of COVID-19 using Acoustics (DiCOVA) Challenge
N. Sharma
Ananya Muguli
Prashant Krishnan
Rohit Kumar
Srikanth Raj Chetupalli
Sriram Ganapathy
33
13
0
21 Jun 2021
Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection
Kazuki Shimada
Naoya Takahashi
Yuichiro Koyama
Shusuke Takahashi
E. Tsunoo
Masafumi Takahashi
Yuki Mitsufuji
30
23
0
21 Jun 2021
Multi-mode Transformer Transducer with Stochastic Future Context
Kwangyoun Kim
Felix Wu
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
35
9
0
17 Jun 2021
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Nanxin Chen
Yu Zhang
Heiga Zen
Ron J. Weiss
Mohammad Norouzi
Najim Dehak
William Chan
DiffM
23
88
0
17 Jun 2021
Collaborative Training of Acoustic Encoders for Speech Recognition
Varun K. Nagaraja
Yangyang Shi
Ganesh Venkatesh
Ozlem Kalinli
M. Seltzer
Vikas Chandra
48
11
0
16 Jun 2021
VidHarm: A Clip Based Dataset for Harmful Content Detection
Johan Edstedt
Amanda Berg
Michael Felsberg
Johan Karlsson
Francisca Benavente
Anette Novak
G. Pihlgren
28
2
0
15 Jun 2021
SynthASR: Unlocking Synthetic Data for Speech Recognition
A. Fazel
Wei Yang
Yulan Liu
Roberto Barra-Chicote
Yi Meng
Roland Maas
J. Droppo
SyDa
21
48
0
14 Jun 2021
End-to-end Neural Diarization: From Transformer to Conformer
Yi Y. Liu
Eunjung Han
Chul Lee
A. Stolcke
22
40
0
14 Jun 2021
RealTranS: End-to-End Simultaneous Speech Translation with Convolutional Weighted-Shrinking Transformer
Xingshan Zeng
Liangyou Li
Qun Liu
25
45
0
09 Jun 2021
SpeechBrain: A General-Purpose Speech Toolkit
Mirco Ravanelli
Titouan Parcollet
Peter William VanHarn Plantinga
Aku Rouhe
Samuele Cornell
...
William Aris
Hwidong Na
Yan Gao
R. Mori
Yoshua Bengio
24
752
0
08 Jun 2021
Broadcasted Residual Learning for Efficient Keyword Spotting
Byeonggeun Kim
Simyung Chang
Jinkyu Lee
Dooyong Sung
31
122
0
08 Jun 2021
EventDrop: data augmentation for event-based learning
Fuqiang Gu
Weicong Sng
Xuke Hu
Fei Yu
24
37
0
07 Jun 2021
Impact of data-splits on generalization: Identifying COVID-19 from cough and context
Makkunda Sharma
Nikhil Shenoy
Jigar Doshi
Piyush Bagad
Aman Dalmia
Parag Bhamare
A. Mahale
S. Rane
Neeraj Agrawal
R. Panicker
OOD
52
4
0
05 Jun 2021
Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition
Yihong Dong
Ying Peng
Muqiao Yang
Songtao Lu
Qingjiang Shi
49
9
0
05 Jun 2021
ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition
S. Verbitskiy
Vladimir Berikov
Viacheslav Vyshegorodtsev
24
73
0
03 Jun 2021
Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition
Hiroshi Sato
Tsubasa Ochiai
Marc Delcroix
K. Kinoshita
Takafumi Moriya
Naoyuki Kamo
33
23
0
02 Jun 2021
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning
Haibin Wu
Xu Li
Andy T. Liu
Zhiyong Wu
Helen Meng
Hung-yi Lee
AAML
SSL
55
29
0
01 Jun 2021
The Imaginative Generative Adversarial Network: Automatic Data Augmentation for Dynamic Skeleton-Based Hand Gesture and Human Action Recognition
Junxiao Shen
John J. Dudley
Per Ola Kristensson
SLR
GAN
38
23
0
27 May 2021
Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders
Chen Xu
Bojie Hu
Yanyang Li
Yuhao Zhang
Shen Huang
Qi Ju
Tong Xiao
Jingbo Zhu
25
76
0
12 May 2021
Previous
1
2
3
...
10
11
12
13
14
15
Next