ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXiv (abs)PDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 1,048 papers shown
Title
You Do Not Need More Data: Improving End-To-End Speech Recognition by
  Text-To-Speech Data Augmentation
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
A. Laptev
Roman Korostik
A. Svischev
A. Andrusenko
Ivan Medennikov
S. Rybin
81
61
0
14 May 2020
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in
  TDNN Based Speaker Verification
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
Brecht Desplanques
Jenthe Thienpondt
Kris Demuynck
90
1,350
0
14 May 2020
Streaming keyword spotting on mobile devices
Streaming keyword spotting on mobile devices
Oleg Rybakov
Natasha Kononenko
Niranjan A. Subrahmanya
Mirkó Visontai
Stella Laurenzo
AI4TS
127
112
0
14 May 2020
Infant Crying Detection in Real-World Environments
Infant Crying Detection in Real-World Environments
X. Yao
Megan Micheletti
Mckensey Johnson
Edison Thomaz
K. D. Barbaro
39
25
0
12 May 2020
Listen Attentively, and Spell Once: Whole Sentence Generation via a
  Non-Autoregressive Architecture for Low-Latency Speech Recognition
Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
Ye Bai
Jiangyan Yi
J. Tao
Zhengkun Tian
Zhengqi Wen
Shuai Zhang
RALM
80
41
0
11 May 2020
CTC-synchronous Training for Monotonic Attention Model
CTC-synchronous Training for Monotonic Attention Model
Hirofumi Inaguma
Masato Mimura
Tatsuya Kawahara
37
7
0
10 May 2020
ContextNet: Improving Convolutional Neural Networks for Automatic Speech
  Recognition with Global Context
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
Wei Han
Zhengdong Zhang
Yu Zhang
Jiahui Yu
Chung-Cheng Chiu
James Qin
Anmol Gulati
Ruoming Pang
Yonghui Wu
101
264
0
07 May 2020
Data Augmentation for Hypernymy Detection
Data Augmentation for Hypernymy Detection
Thomas Kober
Julie Weeds
Lorenzo Bertolini
David J. Weir
97
19
0
04 May 2020
MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech
MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech
Jakob Drachmann Havtorn
Jan Latko
Joakim Edin
Lasse Borgholt
Lars Maaløe
Lorenzo Belgrano
Nicolai Frost Jakobsen
R. Sdun
Zeljko Agic
36
3
0
02 May 2020
Multi-head Monotonic Chunkwise Attention For Online Speech Recognition
Multi-head Monotonic Chunkwise Attention For Online Speech Recognition
Baiji Liu
Songjun Cao
Sining Sun
Weibin Zhang
Long Ma
51
9
0
01 May 2020
Logic-Guided Data Augmentation and Regularization for Consistent
  Question Answering
Logic-Guided Data Augmentation and Regularization for Consistent Question Answering
Akari Asai
Hannaneh Hajishirzi
NAI
104
117
0
21 Apr 2020
Curriculum Pre-training for End-to-End Speech Translation
Curriculum Pre-training for End-to-End Speech Translation
Chengyi Wang
Yu Wu
Shujie Liu
Ming Zhou
Zhenglu Yang
85
109
0
21 Apr 2020
LiteDenseNet: A Lightweight Network for Hyperspectral Image Classification
Rui Li
Chenxi Duan
23
8
0
17 Apr 2020
Analyzing analytical methods: The case of phonology in neural models of
  spoken language
Analyzing analytical methods: The case of phonology in neural models of spoken language
Grzegorz Chrupała
Bertrand Higy
Afra Alishahi
61
20
0
15 Apr 2020
The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with
  SpecAugment
The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment
Wei Zhou
Wilfried Michel
Kazuki Irie
M. Kitza
Ralf Schluter
Hermann Ney
42
43
0
02 Apr 2020
Serialized Output Training for End-to-End Overlapped Speech Recognition
Serialized Output Training for End-to-End Overlapped Speech Recognition
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
85
122
0
28 Mar 2020
Stochastic Frequency Masking to Improve Super-Resolution and Denoising
  Networks
Stochastic Frequency Masking to Improve Super-Resolution and Denoising Networks
Majed El Helou
Ruofan Zhou
Sabine Süsstrunk
109
45
0
16 Mar 2020
On Compositions of Transformations in Contrastive Self-Supervised
  Learning
On Compositions of Transformations in Contrastive Self-Supervised Learning
Mandela Patrick
Yuki M. Asano
Polina Kuznetsova
Ruth C. Fong
João F. Henriques
Geoffrey Zweig
Andrea Vedaldi
89
50
0
09 Mar 2020
Deep Neural Networks for Automatic Speech Processing: A Survey from
  Large Corpora to Limited Data
Deep Neural Networks for Automatic Speech Processing: A Survey from Large Corpora to Limited Data
Vincent Roger
Jérôme Farinas
J. Pinquier
57
24
0
09 Mar 2020
AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
Esteban Real
Chen Liang
David R. So
Quoc V. Le
79
226
0
06 Mar 2020
Time Series Data Augmentation for Deep Learning: A Survey
Time Series Data Augmentation for Deep Learning: A Survey
Qingsong Wen
Liang Sun
Fan Yang
Xiaomin Song
Jing Gao
Xue Wang
Huan Xu
AI4TS
150
649
0
27 Feb 2020
SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech
  Translation
SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation
Arya D. McCarthy
Liezl Puzon
J. Pino
77
24
0
27 Feb 2020
Imputer: Sequence Modelling via Imputation and Dynamic Programming
Imputer: Sequence Modelling via Imputation and Dynamic Programming
William Chan
Chitwan Saharia
Geoffrey E. Hinton
Mohammad Norouzi
Navdeep Jaitly
BDLAI4TS
97
116
0
20 Feb 2020
Small energy masking for improved neural network training for end-to-end
  speech recognition
Small energy masking for improved neural network training for end-to-end speech recognition
Chanwoo Kim
Kwangyoun Kim
S. Indurthi
50
8
0
15 Feb 2020
Attentional Speech Recognition Models Misbehave on Out-of-domain
  Utterances
Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances
Phillip Keung
Wei Niu
Y. Lu
Julian Salazar
Vikas Bhardwaj
72
9
0
12 Feb 2020
AlignNet: A Unifying Approach to Audio-Visual Alignment
AlignNet: A Unifying Approach to Audio-Visual Alignment
Jianren Wang
Zhaoyuan Fang
Hang Zhao
57
37
0
12 Feb 2020
Learning with Out-of-Distribution Data for Audio Classification
Learning with Out-of-Distribution Data for Audio Classification
Turab Iqbal
Yin Cao
Qiuqiang Kong
Mark D. Plumbley
Wenwu Wang
OODD
39
17
0
11 Feb 2020
Accelerating RNN Transducer Inference via One-Step Constrained Beam
  Search
Accelerating RNN Transducer Inference via One-Step Constrained Beam Search
Juntae Kim
Yoonhan Lee
67
24
0
10 Feb 2020
Transformer Transducer: A Streamable Speech Recognition Model with
  Transformer Encoders and RNN-T Loss
Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss
Qian Zhang
Han Lu
Hasim Sak
Anshuman Tripathi
Erik McDermott
Stephen Koo
Shankar Kumar
108
482
0
07 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized
  fine-grained VAE and auto-regressive prosody prior
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
DiffM
101
93
0
06 Feb 2020
CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus
CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus
Changhan Wang
J. Pino
Anne Wu
Jiatao Gu
SLR
111
86
0
04 Feb 2020
Learning Robust and Multilingual Speech Representations
Learning Robust and Multilingual Speech Representations
Kazuya Kawakami
Luyu Wang
Chris Dyer
Phil Blunsom
Aaron van den Oord
SSL
97
100
0
29 Jan 2020
Unsupervised Pre-training of Bidirectional Speech Encoders via Masked
  Reconstruction
Unsupervised Pre-training of Bidirectional Speech Encoders via Masked Reconstruction
Weiran Wang
Qingming Tang
Karen Livescu
SSL
84
98
0
28 Jan 2020
Scaling Up Online Speech Recognition Using ConvNets
Scaling Up Online Speech Recognition Using ConvNets
Vineel Pratap
Qiantong Xu
Jacob Kahn
Gilad Avidov
Tatiana Likhomanenko
Awni Y. Hannun
Vitaliy Liptchinsky
Gabriel Synnaeve
R. Collobert
242
39
0
27 Jan 2020
Multi-task self-supervised learning for Robust Speech Recognition
Multi-task self-supervised learning for Robust Speech Recognition
Mirco Ravanelli
Jianyuan Zhong
Santiago Pascual
P. Swietojanski
João Monteiro
J. Trmal
Yoshua Bengio
SSL
292
290
0
25 Jan 2020
Semi-supervised ASR by End-to-end Self-training
Semi-supervised ASR by End-to-end Self-training
Yang Chen
Weiran Wang
Chao Wang
72
53
0
24 Jan 2020
FixMatch: Simplifying Semi-Supervised Learning with Consistency and
  Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
Kihyuk Sohn
David Berthelot
Chun-Liang Li
Zizhao Zhang
Nicholas Carlini
E. D. Cubuk
Alexey Kurakin
Han Zhang
Colin Raffel
AAML
173
3,603
0
21 Jan 2020
Single headed attention based sequence-to-sequence model for
  state-of-the-art results on Switchboard
Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard
Zoltán Tüske
G. Saon
Kartik Audhkhasi
Brian Kingsbury
BDL
101
69
0
20 Jan 2020
Streaming automatic speech recognition with the transformer model
Streaming automatic speech recognition with the transformer model
Niko Moritz
Takaaki Hori
Jonathan Le Roux
142
187
0
08 Jan 2020
Learning Speaker Embedding with Momentum Contrast
Learning Speaker Embedding with Momentum Contrast
Ke Ding
Xuanji He
Guanglu Wan
SSL
108
10
0
07 Jan 2020
Mel-spectrogram augmentation for sequence to sequence voice conversion
Mel-spectrogram augmentation for sequence to sequence voice conversion
Yeongtae Hwang
Hyemin Cho
Hongsun Yang
Dong-Ok Won
Insoo Oh
Seong-Whan Lee
65
15
0
06 Jan 2020
Disentangling Trainability and Generalization in Deep Neural Networks
Disentangling Trainability and Generalization in Deep Neural Networks
Lechao Xiao
Jeffrey Pennington
S. Schoenholz
82
34
0
30 Dec 2019
Improved Multi-Stage Training of Online Attention-based Encoder-Decoder
  Models
Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models
Abhinav Garg
Dhananjaya N. Gowda
Ankur Kumar
Kwangyoun Kim
Mehul Kumar
Chanwoo Kim
3DV
44
15
0
28 Dec 2019
power-law nonlinearity with maximally uniform distribution criterion for
  improved neural network training in automatic speech recognition
power-law nonlinearity with maximally uniform distribution criterion for improved neural network training in automatic speech recognition
Chanwoo Kim
Mehul Kumar
Kwangyoun Kim
Dhananjaya N. Gowda
60
9
0
22 Dec 2019
end-to-end training of a large vocabulary end-to-end speech recognition
  system
end-to-end training of a large vocabulary end-to-end speech recognition system
Chanwoo Kim
Sungsoo Kim
Kwangyoun Kim
Mehul Kumar
Jiyeon Kim
...
Eunhyang Kim
Minkyoo Shin
Shatrughan Singh
Larry Heck
Dhananjaya N. Gowda
61
27
0
22 Dec 2019
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern
  Recognition
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLMSSL
257
1,091
0
21 Dec 2019
Generating Synthetic Audio Data for Attention-Based Speech Recognition
  Systems
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems
Nick Rossenbach
Albert Zeyer
Ralf Schluter
Hermann Ney
95
84
0
19 Dec 2019
Environmental Sound Classification with Parallel Temporal-spectral
  Attention
Environmental Sound Classification with Parallel Temporal-spectral Attention
Helin Wang
Yuexian Zou
Dading Chong
Wenwu Wang
72
4
0
14 Dec 2019
SpecAugment on Large Scale Datasets
SpecAugment on Large Scale Datasets
Daniel S. Park
Yu Zhang
Chung-Cheng Chiu
Youzheng Chen
Yue Liu
William Chan
Quoc V. Le
Yonghui Wu
86
138
0
11 Dec 2019
Audiogmenter: a MATLAB Toolbox for Audio Data Augmentation
Audiogmenter: a MATLAB Toolbox for Audio Data Augmentation
Gianluca Maguolo
M. Paci
L. Nanni
Lu Bonan
45
15
0
11 Dec 2019
Previous
123...192021
Next