Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.08779
Cited By
v1
v2
v3 (latest)
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"
50 / 1,048 papers shown
Title
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
A. Laptev
Roman Korostik
A. Svischev
A. Andrusenko
Ivan Medennikov
S. Rybin
81
61
0
14 May 2020
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
Brecht Desplanques
Jenthe Thienpondt
Kris Demuynck
90
1,350
0
14 May 2020
Streaming keyword spotting on mobile devices
Oleg Rybakov
Natasha Kononenko
Niranjan A. Subrahmanya
Mirkó Visontai
Stella Laurenzo
AI4TS
127
112
0
14 May 2020
Infant Crying Detection in Real-World Environments
X. Yao
Megan Micheletti
Mckensey Johnson
Edison Thomaz
K. D. Barbaro
39
25
0
12 May 2020
Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
Ye Bai
Jiangyan Yi
J. Tao
Zhengkun Tian
Zhengqi Wen
Shuai Zhang
RALM
80
41
0
11 May 2020
CTC-synchronous Training for Monotonic Attention Model
Hirofumi Inaguma
Masato Mimura
Tatsuya Kawahara
37
7
0
10 May 2020
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
Wei Han
Zhengdong Zhang
Yu Zhang
Jiahui Yu
Chung-Cheng Chiu
James Qin
Anmol Gulati
Ruoming Pang
Yonghui Wu
101
264
0
07 May 2020
Data Augmentation for Hypernymy Detection
Thomas Kober
Julie Weeds
Lorenzo Bertolini
David J. Weir
97
19
0
04 May 2020
MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech
Jakob Drachmann Havtorn
Jan Latko
Joakim Edin
Lasse Borgholt
Lars Maaløe
Lorenzo Belgrano
Nicolai Frost Jakobsen
R. Sdun
Zeljko Agic
36
3
0
02 May 2020
Multi-head Monotonic Chunkwise Attention For Online Speech Recognition
Baiji Liu
Songjun Cao
Sining Sun
Weibin Zhang
Long Ma
51
9
0
01 May 2020
Logic-Guided Data Augmentation and Regularization for Consistent Question Answering
Akari Asai
Hannaneh Hajishirzi
NAI
104
117
0
21 Apr 2020
Curriculum Pre-training for End-to-End Speech Translation
Chengyi Wang
Yu Wu
Shujie Liu
Ming Zhou
Zhenglu Yang
85
109
0
21 Apr 2020
LiteDenseNet: A Lightweight Network for Hyperspectral Image Classification
Rui Li
Chenxi Duan
23
8
0
17 Apr 2020
Analyzing analytical methods: The case of phonology in neural models of spoken language
Grzegorz Chrupała
Bertrand Higy
Afra Alishahi
61
20
0
15 Apr 2020
The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment
Wei Zhou
Wilfried Michel
Kazuki Irie
M. Kitza
Ralf Schluter
Hermann Ney
42
43
0
02 Apr 2020
Serialized Output Training for End-to-End Overlapped Speech Recognition
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
85
122
0
28 Mar 2020
Stochastic Frequency Masking to Improve Super-Resolution and Denoising Networks
Majed El Helou
Ruofan Zhou
Sabine Süsstrunk
109
45
0
16 Mar 2020
On Compositions of Transformations in Contrastive Self-Supervised Learning
Mandela Patrick
Yuki M. Asano
Polina Kuznetsova
Ruth C. Fong
João F. Henriques
Geoffrey Zweig
Andrea Vedaldi
89
50
0
09 Mar 2020
Deep Neural Networks for Automatic Speech Processing: A Survey from Large Corpora to Limited Data
Vincent Roger
Jérôme Farinas
J. Pinquier
57
24
0
09 Mar 2020
AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
Esteban Real
Chen Liang
David R. So
Quoc V. Le
79
226
0
06 Mar 2020
Time Series Data Augmentation for Deep Learning: A Survey
Qingsong Wen
Liang Sun
Fan Yang
Xiaomin Song
Jing Gao
Xue Wang
Huan Xu
AI4TS
150
649
0
27 Feb 2020
SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation
Arya D. McCarthy
Liezl Puzon
J. Pino
77
24
0
27 Feb 2020
Imputer: Sequence Modelling via Imputation and Dynamic Programming
William Chan
Chitwan Saharia
Geoffrey E. Hinton
Mohammad Norouzi
Navdeep Jaitly
BDL
AI4TS
97
116
0
20 Feb 2020
Small energy masking for improved neural network training for end-to-end speech recognition
Chanwoo Kim
Kwangyoun Kim
S. Indurthi
50
8
0
15 Feb 2020
Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances
Phillip Keung
Wei Niu
Y. Lu
Julian Salazar
Vikas Bhardwaj
72
9
0
12 Feb 2020
AlignNet: A Unifying Approach to Audio-Visual Alignment
Jianren Wang
Zhaoyuan Fang
Hang Zhao
57
37
0
12 Feb 2020
Learning with Out-of-Distribution Data for Audio Classification
Turab Iqbal
Yin Cao
Qiuqiang Kong
Mark D. Plumbley
Wenwu Wang
OODD
39
17
0
11 Feb 2020
Accelerating RNN Transducer Inference via One-Step Constrained Beam Search
Juntae Kim
Yoonhan Lee
67
24
0
10 Feb 2020
Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss
Qian Zhang
Han Lu
Hasim Sak
Anshuman Tripathi
Erik McDermott
Stephen Koo
Shankar Kumar
108
482
0
07 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
DiffM
101
93
0
06 Feb 2020
CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus
Changhan Wang
J. Pino
Anne Wu
Jiatao Gu
SLR
111
86
0
04 Feb 2020
Learning Robust and Multilingual Speech Representations
Kazuya Kawakami
Luyu Wang
Chris Dyer
Phil Blunsom
Aaron van den Oord
SSL
97
100
0
29 Jan 2020
Unsupervised Pre-training of Bidirectional Speech Encoders via Masked Reconstruction
Weiran Wang
Qingming Tang
Karen Livescu
SSL
84
98
0
28 Jan 2020
Scaling Up Online Speech Recognition Using ConvNets
Vineel Pratap
Qiantong Xu
Jacob Kahn
Gilad Avidov
Tatiana Likhomanenko
Awni Y. Hannun
Vitaliy Liptchinsky
Gabriel Synnaeve
R. Collobert
242
39
0
27 Jan 2020
Multi-task self-supervised learning for Robust Speech Recognition
Mirco Ravanelli
Jianyuan Zhong
Santiago Pascual
P. Swietojanski
João Monteiro
J. Trmal
Yoshua Bengio
SSL
292
290
0
25 Jan 2020
Semi-supervised ASR by End-to-end Self-training
Yang Chen
Weiran Wang
Chao Wang
72
53
0
24 Jan 2020
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
Kihyuk Sohn
David Berthelot
Chun-Liang Li
Zizhao Zhang
Nicholas Carlini
E. D. Cubuk
Alexey Kurakin
Han Zhang
Colin Raffel
AAML
173
3,603
0
21 Jan 2020
Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard
Zoltán Tüske
G. Saon
Kartik Audhkhasi
Brian Kingsbury
BDL
101
69
0
20 Jan 2020
Streaming automatic speech recognition with the transformer model
Niko Moritz
Takaaki Hori
Jonathan Le Roux
142
187
0
08 Jan 2020
Learning Speaker Embedding with Momentum Contrast
Ke Ding
Xuanji He
Guanglu Wan
SSL
108
10
0
07 Jan 2020
Mel-spectrogram augmentation for sequence to sequence voice conversion
Yeongtae Hwang
Hyemin Cho
Hongsun Yang
Dong-Ok Won
Insoo Oh
Seong-Whan Lee
65
15
0
06 Jan 2020
Disentangling Trainability and Generalization in Deep Neural Networks
Lechao Xiao
Jeffrey Pennington
S. Schoenholz
82
34
0
30 Dec 2019
Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models
Abhinav Garg
Dhananjaya N. Gowda
Ankur Kumar
Kwangyoun Kim
Mehul Kumar
Chanwoo Kim
3DV
44
15
0
28 Dec 2019
power-law nonlinearity with maximally uniform distribution criterion for improved neural network training in automatic speech recognition
Chanwoo Kim
Mehul Kumar
Kwangyoun Kim
Dhananjaya N. Gowda
60
9
0
22 Dec 2019
end-to-end training of a large vocabulary end-to-end speech recognition system
Chanwoo Kim
Sungsoo Kim
Kwangyoun Kim
Mehul Kumar
Jiyeon Kim
...
Eunhyang Kim
Minkyoo Shin
Shatrughan Singh
Larry Heck
Dhananjaya N. Gowda
61
27
0
22 Dec 2019
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLM
SSL
257
1,091
0
21 Dec 2019
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems
Nick Rossenbach
Albert Zeyer
Ralf Schluter
Hermann Ney
95
84
0
19 Dec 2019
Environmental Sound Classification with Parallel Temporal-spectral Attention
Helin Wang
Yuexian Zou
Dading Chong
Wenwu Wang
72
4
0
14 Dec 2019
SpecAugment on Large Scale Datasets
Daniel S. Park
Yu Zhang
Chung-Cheng Chiu
Youzheng Chen
Yue Liu
William Chan
Quoc V. Le
Yonghui Wu
86
138
0
11 Dec 2019
Audiogmenter: a MATLAB Toolbox for Audio Data Augmentation
Gianluca Maguolo
M. Paci
L. Nanni
Lu Bonan
45
15
0
11 Dec 2019
Previous
1
2
3
...
19
20
21
Next