ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXiv (abs)PDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

48 / 1,048 papers shown
Title
Data Augmentation for Deep Learning-based Radio Modulation
  Classification
Data Augmentation for Deep Learning-based Radio Modulation Classification
Liang Huang
Weijian Pan
You Zhang
L. Qian
Nan Gao
Yuan Wu
62
136
0
06 Dec 2019
Semantic Mask for Transformer based End-to-End Speech Recognition
Semantic Mask for Transformer based End-to-End Speech Recognition
Chengyi Wang
Yu Wu
Yujiao Du
Jinyu Li
Shujie Liu
Liang Lu
Shuo Ren
Guoli Ye
Sheng Zhao
Ming Zhou
70
52
0
06 Dec 2019
Towards Robust Neural Vocoding for Speech Generation: A Survey
Towards Robust Neural Vocoding for Speech Generation: A Survey
Po-Chun Hsu
Chun-hsuan Wang
Andy T. Liu
Hung-yi Lee
OOD
78
25
0
05 Dec 2019
Distance-Based Learning from Errors for Confidence Calibration
Distance-Based Learning from Errors for Confidence Calibration
Chen Xing
Sercan O. Arik
Zizhao Zhang
Tomas Pfister
FedML
75
39
0
03 Dec 2019
Deep Contextualized Acoustic Representations For Semi-Supervised Speech
  Recognition
Deep Contextualized Acoustic Representations For Semi-Supervised Speech Recognition
Shaoshi Ling
Yuzong Liu
Julian Salazar
Katrin Kirchhoff
SSL
86
139
0
03 Dec 2019
Bimodal Speech Emotion Recognition Using Pre-Trained Language Models
Bimodal Speech Emotion Recognition Using Pre-Trained Language Models
Verena Heusser
Niklas Freymuth
Stefan Constantin
A. Waibel
92
26
0
29 Nov 2019
Augmentation Methods on Monophonic Audio for Instrument Classification
  in Polyphonic Music
Augmentation Methods on Monophonic Audio for Instrument Classification in Polyphonic Music
Agelos Kratimenos
Kleanthis Avramidis
C. Garoufis
Athanasia Zlatintsi
Petros Maragos
67
20
0
28 Nov 2019
Neural Random Forest Imitation
Neural Random Forest Imitation
Christoph Reinders
Bodo Rosenhahn
43
1
0
25 Nov 2019
Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR
  Models
Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models
Zhiyun Lu
Liangliang Cao
Yu Zhang
Chung-Cheng Chiu
James Fan
58
72
0
21 Nov 2019
On Using SpecAugment for End-to-End Speech Translation
On Using SpecAugment for End-to-End Speech Translation
Parnia Bahar
Albert Zeyer
Ralf Schluter
Hermann Ney
92
54
0
20 Nov 2019
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern
  Architectures
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
Gabriel Synnaeve
Qiantong Xu
Jacob Kahn
Tatiana Likhomanenko
Edouard Grave
Vineel Pratap
Anuroop Sriram
Vitaliy Liptchinsky
R. Collobert
SSLAI4TS
134
248
0
19 Nov 2019
Music theme recognition using CNN and self-attention
Music theme recognition using CNN and self-attention
Manoj Sukhavasi
Sainath Adapa
ViT
69
17
0
16 Nov 2019
Effectiveness of self-supervised pre-training for speech recognition
Effectiveness of self-supervised pre-training for speech recognition
Alexei Baevski
Michael Auli
Abdel-rahman Mohamed
SSL
115
147
0
10 Nov 2019
Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models
Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models
Siddharth Dalmia
Abdel-rahman Mohamed
M. Lewis
Florian Metze
Luke Zettlemoyer
55
11
0
09 Nov 2019
RNN-T For Latency Controlled ASR With Improved Beam Search
RNN-T For Latency Controlled ASR With Improved Beam Search
Mahaveer Jain
Kjell Schubert
Jay Mahadeokar
Ching-Feng Yeh
Kaustubh Kalgaonkar
Anuroop Sriram
Christian Fuegen
M. Seltzer
80
45
0
05 Nov 2019
What does a network layer hear? Analyzing hidden representations of
  end-to-end ASR through speech synthesis
What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis
Chung-Yi Li
Pei-Chieh Yuan
Hung-yi Lee
71
31
0
04 Nov 2019
Improving sequence-to-sequence speech recognition training with
  on-the-fly data augmentation
Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation
T. Nguyen
S. Stueker
Jan Niehues
A. Waibel
94
98
0
29 Oct 2019
Transformer-Transducer: End-to-End Speech Recognition with
  Self-Attention
Transformer-Transducer: End-to-End Speech Recognition with Self-Attention
Ching-Feng Yeh
Jay Mahadeokar
Kaustubh Kalgaonkar
Yongqiang Wang
Duc Le
Mahaveer Jain
Kjell Schubert
Christian Fuegen
M. Seltzer
98
150
0
28 Oct 2019
Learning Data Manipulation for Augmentation and Weighting
Learning Data Manipulation for Augmentation and Weighting
Zhiting Hu
Bowen Tan
Ruslan Salakhutdinov
Tom Michael Mitchell
Eric Xing
83
120
0
28 Oct 2019
Recognizing long-form speech using streaming end-to-end models
Recognizing long-form speech using streaming end-to-end models
A. Narayanan
Rohit Prabhavalkar
Chung-Cheng Chiu
David Rybach
Tara N. Sainath
Trevor Strohman
79
130
0
24 Oct 2019
Correction of Automatic Speech Recognition with Transformer
  Sequence-to-sequence Model
Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model
Oleksii Hrinchuk
Mariya Popova
Boris Ginsburg
VLM
62
90
0
23 Oct 2019
A practical two-stage training strategy for multi-stream end-to-end
  speech recognition
A practical two-stage training strategy for multi-stream end-to-end speech recognition
Ruizhi Li
Gregory Sell
Xiaofei Wang
Shinji Watanabe
H. Hermansky
45
7
0
23 Oct 2019
Deja-vu: Double Feature Presentation and Iterated Loss in Deep
  Transformer Networks
Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks
Andros Tjandra
Chunxi Liu
Frank Zhang
Xiaohui Zhang
Yongqiang Wang
Gabriel Synnaeve
Satoshi Nakamura
Geoffrey Zweig
ViT
89
46
0
23 Oct 2019
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
Yongqiang Wang
Abdel-rahman Mohamed
Duc Le
Chunxi Liu
Alex Xiao
...
Xiaohui Zhang
Frank Zhang
Christian Fuegen
Geoffrey Zweig
M. Seltzer
68
249
0
22 Oct 2019
Deep speech inpainting of time-frequency masks
Deep speech inpainting of time-frequency masks
M. Kegler
P. Beckmann
Milos Cernak
61
38
0
20 Oct 2019
Acoustic Scene Classification Based on a Large-margin Factorized CNN
Acoustic Scene Classification Based on a Large-margin Factorized CNN
Janghoon Cho
Sungrack Yun
Hyoungwoo Park
Jungyun Eum
Kyuwoong Hwang
37
13
0
14 Oct 2019
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
Alexei Baevski
Steffen Schneider
Michael Auli
SSL
187
669
0
12 Oct 2019
State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention
  With Dilated 1D Convolutions
State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions
Kyu Jeong Han
R. Prieto
Kaixing(Kai) Wu
T. Ma
134
70
0
01 Oct 2019
RandAugment: Practical automated data augmentation with a reduced search
  space
RandAugment: Practical automated data augmentation with a reduced search space
E. D. Cubuk
Barret Zoph
Jonathon Shlens
Quoc V. Le
MQ
406
3,522
0
30 Sep 2019
GraphMix: Improved Training of GNNs for Semi-Supervised Learning
GraphMix: Improved Training of GNNs for Semi-Supervised Learning
Vikas Verma
Meng Qu
Kenji Kawaguchi
Alex Lamb
Yoshua Bengio
Arno Solin
Jian Tang
92
62
0
25 Sep 2019
Large-scale representation learning from visually grounded untranscribed
  speech
Large-scale representation learning from visually grounded untranscribed speech
Gabriel Ilharco
Yuan Zhang
Jason Baldridge
SSL
87
61
0
19 Sep 2019
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit
Yiming Wang
Tongfei Chen
Hainan Xu
Shuoyang Ding
Hang Lv
Yiwen Shao
Nanyun Peng
Lei Xie
Shinji Watanabe
Sanjeev Khudanpur
VLM
96
73
0
18 Sep 2019
Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End
  Speech Translation
Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation
Chengyi Wang
Yu-Huan Wu
Shujie Liu
Zhenglu Yang
M. Zhou
89
84
0
17 Sep 2019
Integrating Source-channel and Attention-based Sequence-to-sequence
  Models for Speech Recognition
Integrating Source-channel and Attention-based Sequence-to-sequence Models for Speech Recognition
Qiujia Li
Chao Zhang
P. Woodland
63
20
0
14 Sep 2019
Multilingual Graphemic Hybrid ASR with Massive Data Augmentation
Multilingual Graphemic Hybrid ASR with Massive Data Augmentation
Chunxi Liu
Qiaochu Zhang
Xiaohui Zhang
Kritika Singh
Yatharth Saraf
Geoffrey Zweig
70
27
0
14 Sep 2019
A Comparative Study on Transformer vs RNN in Speech Applications
A Comparative Study on Transformer vs RNN in Speech Applications
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
Hirofumi Inaguma
...
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
96
722
0
13 Sep 2019
Preech: A System for Privacy-Preserving Speech Transcription
Preech: A System for Privacy-Preserving Speech Transcription
Shimaa Ahmed
Amrita Roy Chowdhury
Kassem Fawaz
P. Ramanathan
127
48
0
09 Sep 2019
Sound source detection, localization and classification using
  consecutive ensemble of CRNN models
Sound source detection, localization and classification using consecutive ensemble of CRNN models
Slawomir Kapka
M. Lewandowski
122
66
0
02 Aug 2019
Trading via Image Classification
Trading via Image Classification
N. Cohen
T. Balch
Manuela Veloso
103
35
0
23 Jul 2019
BERTphone: Phonetically-Aware Encoder Representations for
  Utterance-Level Speaker and Language Recognition
BERTphone: Phonetically-Aware Encoder Representations for Utterance-Level Speaker and Language Recognition
Shaoshi Ling
Julian Salazar
Yuzong Liu
Katrin Kirchhoff
SSL
93
28
0
30 Jun 2019
CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition
CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition
Linhao Dong
Bo Xu
85
128
0
27 May 2019
Language Modeling with Deep Transformers
Language Modeling with Deep Transformers
Kazuki Irie
Albert Zeyer
Ralf Schluter
Hermann Ney
KELM
104
176
0
10 May 2019
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data
  Augmentation
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation
Christoph Luscher
Eugen Beck
Kazuki Irie
M. Kitza
Wilfried Michel
Albert Zeyer
Ralf Schluter
Hermann Ney
VLM
150
234
0
08 May 2019
Unsupervised Data Augmentation for Consistency Training
Unsupervised Data Augmentation for Consistency Training
Qizhe Xie
Zihang Dai
Eduard H. Hovy
Minh-Thang Luong
Quoc V. Le
160
2,337
0
29 Apr 2019
Towards Efficient Model Compression via Learned Global Ranking
Towards Efficient Model Compression via Learned Global Ranking
Ting-Wu Chin
Ruizhou Ding
Cha Zhang
Diana Marculescu
83
172
0
28 Apr 2019
Explaining Deep Classification of Time-Series Data with Learned
  Prototypes
Explaining Deep Classification of Time-Series Data with Learned Prototypes
Alan H. Gee
Diego Garcia-Olano
Joydeep Ghosh
D. Paydarfar
AI4TS
107
67
0
18 Apr 2019
Jasper: An End-to-End Convolutional Neural Acoustic Model
Jasper: An End-to-End Convolutional Neural Acoustic Model
Jason Chun Lok Li
Vitaly Lavrukhin
Boris Ginsburg
Ryan Leary
Oleksii Kuchaiev
Jonathan M. Cohen
Huyen Nguyen
R. Gadde
DRLVLMAuLLM
79
265
0
05 Apr 2019
On the Choice of Modeling Unit for Sequence-to-Sequence Speech
  Recognition
On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition
Kazuki Irie
Rohit Prabhavalkar
Anjuli Kannan
A. Bruguier
David Rybach
Patrick Nguyen
68
37
0
05 Feb 2019
Previous
123...192021