ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXiv (abs)PDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 1,048 papers shown
Title
An Empirical Survey of Data Augmentation for Time Series Classification
  with Neural Networks
An Empirical Survey of Data Augmentation for Time Series Classification with Neural Networks
Brian Kenji Iwana
S. Uchida
AI4TS
90
506
0
31 Jul 2020
Semi-Supervised Learning with Data Augmentation for End-to-End ASR
Semi-Supervised Learning with Data Augmentation for End-to-End ASR
F. Weninger
F. Mana
R. Gemello
Jesús Andrés-Ferrer
P. Zhan
88
30
0
27 Jul 2020
Efficient minimum word error rate training of RNN-Transducer for
  end-to-end speech recognition
Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition
Jinxi Guo
Gautam Tiwari
J. Droppo
Maarten Van Segbroeck
Che-Wei Huang
A. Stolcke
Roland Maas
71
55
0
27 Jul 2020
CoVoST 2 and Massively Multilingual Speech-to-Text Translation
CoVoST 2 and Massively Multilingual Speech-to-Text Translation
Changhan Wang
Anne Wu
J. Pino
SLR
93
75
0
20 Jul 2020
Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype
  Mining and Language-Dependent Score Normalization
Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization
Jenthe Thienpondt
Brecht Desplanques
Kris Demuynck
67
24
0
15 Jul 2020
TERA: Self-Supervised Learning of Transformer Encoder Representation for
  Speech
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
Andy T. Liu
Shang-Wen Li
Hung-yi Lee
SSL
178
362
0
12 Jul 2020
Data augmentation enhanced speaker enrollment for text-dependent speaker
  verification
Data augmentation enhanced speaker enrollment for text-dependent speaker verification
A. K. Sarkar
H. Sarma
Priyanka Dwivedi
Zheng-Hua Tan
24
3
0
12 Jul 2020
Class LM and word mapping for contextual biasing in End-to-End ASR
Class LM and word mapping for contextual biasing in End-to-End ASR
Rongqing Huang
Ossama Abdel-Hamid
Xinwei Li
G. Evermann
57
49
0
10 Jul 2020
Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters
Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters
Vineel Pratap
Anuroop Sriram
Paden Tomasello
Awni Y. Hannun
Vitaliy Liptchinsky
Gabriel Synnaeve
R. Collobert
91
143
0
06 Jul 2020
Pretrained Semantic Speech Embeddings for End-to-End Spoken Language
  Understanding via Cross-Modal Teacher-Student Learning
Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning
Pavel Denisov
Ngoc Thang Vu
77
30
0
03 Jul 2020
Data Augmenting Contrastive Learning of Speech Representations in the
  Time Domain
Data Augmenting Contrastive Learning of Speech Representations in the Time Domain
Eugene Kharitonov
M. Rivière
Gabriel Synnaeve
Lior Wolf
Pierre-Emmanuel Mazaré
Matthijs Douze
Emmanuel Dupoux
137
118
0
02 Jul 2020
Polyphonic sound event detection based on convolutional recurrent neural
  networks with semi-supervised loss function for DCASE challenge 2020 task 4
Polyphonic sound event detection based on convolutional recurrent neural networks with semi-supervised loss function for DCASE challenge 2020 task 4
Nam Kyun Kim
Hyeongju Kim
60
3
0
02 Jul 2020
Self-Supervised MultiModal Versatile Networks
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
198
375
0
29 Jun 2020
Real Time Speech Enhancement in the Waveform Domain
Real Time Speech Enhancement in the Waveform Domain
Alexandre Défossez
Gabriel Synnaeve
Yossi Adi
109
466
0
23 Jun 2020
Self-Supervised Representations Improve End-to-End Speech Translation
Self-Supervised Representations Improve End-to-End Speech Translation
Anne Wu
Changhan Wang
J. Pino
Jiatao Gu
SSL
110
40
0
22 Jun 2020
Sound Event Localization and Detection Using Activity-Coupled Cartesian
  DOA Vector and RD3net
Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net
Kazuki Shimada
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
121
19
0
22 Jun 2020
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of
  Gradients
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients
Chenfei Zhu
Yu Cheng
Zhe Gan
Furong Huang
Jingjing Liu
Tom Goldstein
ODL
113
2
0
21 Jun 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
  Representations
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
325
5,878
0
20 Jun 2020
Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition
Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition
Xinyuan Zhou
Emre Yilmaz
Yanhua Long
Yijie Li
Haizhou Li
80
52
0
18 Jun 2020
Self-and-Mixed Attention Decoder with Deep Acoustic Structure for
  Transformer-based LVCSR
Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-based LVCSR
Xinyuan Zhou
Grandee Lee
Emre Yilmaz
Yanhua Long
Jiaen Liang
Haizhou Li
65
7
0
18 Jun 2020
Are you wearing a mask? Improving mask detection from speech using
  augmentation by cycle-consistent GANs
Are you wearing a mask? Improving mask detection from speech using augmentation by cycle-consistent GANs
Nicolae-Cuatualin Ristea
Radu Tudor Ionescu
CVBM
93
41
0
17 Jun 2020
The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6
  Challenge
The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge
Ashish Arora
Desh Raj
Aswin Shanmugam Subramanian
Ke Li
Bar Ben Yair
Matthew Maciejewski
Piotr Żelasko
Leibny Paola García-Perera
Shinji Watanabe
Sanjeev Khudanpur
147
9
0
14 Jun 2020
Improving Cross-Lingual Transfer Learning for End-to-End Speech
  Recognition with Speech Translation
Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation
Changhan Wang
J. Pino
Jiatao Gu
79
30
0
09 Jun 2020
End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020
End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020
Marco Gaido
Mattia Antonino Di Gangi
Matteo Negri
Marco Turchi
91
54
0
04 Jun 2020
Contextual RNN-T For Open Domain ASR
Contextual RNN-T For Open Domain ASR
Mahaveer Jain
Gil Keren
Jay Mahadeokar
Geoffrey Zweig
Florian Metze
Yatharth Saraf
63
104
0
04 Jun 2020
Is 42 the Answer to Everything in Subtitling-oriented Speech
  Translation?
Is 42 the Answer to Everything in Subtitling-oriented Speech Translation?
Alina Karakanta
Matteo Negri
Marco Turchi
89
35
0
01 Jun 2020
High-Fidelity Audio Generation and Representation Learning with Guided
  Adversarial Autoencoder
High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder
Kazi Nazmul Haque
R. Rana
Björn W Schuller
DRL
100
12
0
01 Jun 2020
CLOCS: Contrastive Learning of Cardiac Signals Across Space, Time, and
  Patients
CLOCS: Contrastive Learning of Cardiac Signals Across Space, Time, and Patients
Dani Kiyasseh
T. Zhu
David Clifton
120
195
0
27 May 2020
Insertion-Based Modeling for End-to-End Automatic Speech Recognition
Insertion-Based Modeling for End-to-End Automatic Speech Recognition
Yuya Fujita
Shinji Watanabe
Motoi Omachi
Xuankai Chan
80
31
0
27 May 2020
ACGAN-based Data Augmentation Integrated with Long-term Scalogram for
  Acoustic Scene Classification
ACGAN-based Data Augmentation Integrated with Long-term Scalogram for Acoustic Scene Classification
Hangting Chen
Zuozhen Liu
Zongming Liu
Pengyuan Zhang
32
8
0
27 May 2020
Multistream CNN for Robust Acoustic Modeling
Multistream CNN for Robust Acoustic Modeling
Kyu Jeong Han
Jing Pan
Venkata Krishna Naveen Tadala
T. Ma
Daniel Povey
71
34
0
21 May 2020
ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech
  Recognition
ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition
Jing Pan
Joshua Shapiro
Jeremy Wohlwend
Kyu Jeong Han
Tao Lei
T. Ma
72
22
0
21 May 2020
Simplified Self-Attention for Transformer-based End-to-End Speech
  Recognition
Simplified Self-Attention for Transformer-based End-to-End Speech Recognition
Haoneng Luo
Shiliang Zhang
Ming Lei
Lei Xie
128
34
0
21 May 2020
Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech
  Recognition
Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition
Shiliang Zhang
Zhifu Gao
Haoneng Luo
Ming Lei
Jie Ying Gao
Zhijie Yan
Lei Xie
64
29
0
21 May 2020
SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition
SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition
Zhifu Gao
Shiliang Zhang
Ming Lei
Ian Mcloughlin
81
35
0
21 May 2020
Training Keyword Spotting Models on Non-IID Data with Federated Learning
Training Keyword Spotting Models on Non-IID Data with Federated Learning
Andrew Straiton Hard
Kurt Partridge
Cameron Nguyen
Niranjan A. Subrahmanya
Aishanee Shah
Pai Zhu
Ignacio López Moreno
Rajiv Mathews
OODFedML
74
67
0
21 May 2020
A Comparison of Label-Synchronous and Frame-Synchronous End-to-End
  Models for Speech Recognition
A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Linhao Dong
Cheng Yi
Jianzong Wang
Shiyu Zhou
Shuang Xu
X. Jia
Bo Xu
68
17
0
20 May 2020
Early Stage LM Integration Using Local and Global Log-Linear Combination
Early Stage LM Integration Using Local and Global Log-Linear Combination
Wilfried Michel
Ralf Schluter
Hermann Ney
60
11
0
20 May 2020
Relative Positional Encoding for Speech Recognition and Direct
  Translation
Relative Positional Encoding for Speech Recognition and Direct Translation
Ngoc-Quan Pham
Thanh-Le Ha
Tuan-Nam Nguyen
T. Nguyen
Elizabeth Salesky
S. Stueker
Jan Niehues
A. Waibel
56
37
0
20 May 2020
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based
  Quantized DNNs
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs
Yongkweon Jeon
Baeseong Park
S. Kwon
Byeongwook Kim
Jeongin Yun
Dongsoo Lee
MQ
63
31
0
20 May 2020
Improved Noisy Student Training for Automatic Speech Recognition
Improved Noisy Student Training for Automatic Speech Recognition
Daniel S. Park
Yu Zhang
Ye Jia
Wei Han
Chung-Cheng Chiu
Yue Liu
Yonghui Wu
Quoc V. Le
119
243
0
19 May 2020
A systematic comparison of grapheme-based vs. phoneme-based label units
  for encoder-decoder-attention models
A systematic comparison of grapheme-based vs. phoneme-based label units for encoder-decoder-attention models
Mohammad Zeineldeen
Albert Zeyer
Wei Zhou
T. Ng
Ralf Schluter
Hermann Ney
71
2
0
19 May 2020
A New Training Pipeline for an Improved Neural Transducer
A New Training Pipeline for an Improved Neural Transducer
Albert Zeyer
André Merboldt
Ralf Schluter
Hermann Ney
AI4TSMedIm
75
52
0
19 May 2020
Iterative Pseudo-Labeling for Speech Recognition
Iterative Pseudo-Labeling for Speech Recognition
Qiantong Xu
Tatiana Likhomanenko
Jacob Kahn
Awni Y. Hannun
Gabriel Synnaeve
R. Collobert
VLM
96
134
0
19 May 2020
Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict
Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict
Yosuke Higuchi
Shinji Watanabe
Nanxin Chen
Tetsuji Ogawa
Tetsunori Kobayashi
65
139
0
18 May 2020
Attention-based Transducer for Online Speech Recognition
Attention-based Transducer for Online Speech Recognition
Bin Wang
Yan Yin
Hui-Ching Lin
67
4
0
18 May 2020
The NTNU System at the Interspeech 2020 Non-Native Children's Speech ASR
  Challenge
The NTNU System at the Interspeech 2020 Non-Native Children's Speech ASR Challenge
Tien-Hong Lo
Fu-An Chao
Shi-Yan Weng
Berlin Chen
55
11
0
18 May 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
231
3,179
0
16 May 2020
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech
  Recognition
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition
Zhengkun Tian
Jiangyan Yi
J. Tao
Ye Bai
Shuai Zhang
Zhengqi Wen
99
54
0
16 May 2020
Large scale weakly and semi-supervised learning for low-resource video
  ASR
Large scale weakly and semi-supervised learning for low-resource video ASR
Kritika Singh
Vimal Manohar
Alex Xiao
Sergey Edunov
Ross B. Girshick
Vitaliy Liptchinsky
Christian Fuegen
Yatharth Saraf
Geoffrey Zweig
Abdel-rahman Mohamed
77
9
0
16 May 2020
Previous
123...18192021
Next