ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXiv (abs)PDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 1,048 papers shown
Title
Unified End-to-End Speech Recognition and Endpointing for Fast and
  Efficient Speech Systems
Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems
Shaan Bijwadia
Shuo-yiin Chang
Yue Liu
Tara N. Sainath
Chaoyang Zhang
Yanzhang He
75
9
0
01 Nov 2022
Joint Audio/Text Training for Transformer Rescorer of Streaming Speech
  Recognition
Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition
Suyoun Kim
Ke Li
Lucas Kabela
Rongqing Huang
Jiedan Zhu
Ozlem Kalinli
Duc Le
84
8
0
31 Oct 2022
Iterative Teaching by Data Hallucination
Iterative Teaching by Data Hallucination
Zeju Qiu
Weiyang Liu
Tim Z. Xiao
Zhen Liu
Umang Bhatt
Yucen Luo
Adrian Weller
Bernhard Schölkopf
121
9
0
31 Oct 2022
Fast and parallel decoding for transducer
Fast and parallel decoding for transducer
Wei Kang
Liyong Guo
Fangjun Kuang
Long Lin
Mingshuang Luo
Zengwei Yao
Xiaoyu Yang
Piotr Żelasko
Daniel Povey
AI4TS
80
17
0
31 Oct 2022
Delay-penalized transducer for low-latency streaming ASR
Delay-penalized transducer for low-latency streaming ASR
Wei Kang
Zengwei Yao
Fangjun Kuang
Liyong Guo
Xiaoyu Yang
Long lin
Piotr Żelasko
Daniel Povey
89
8
0
31 Oct 2022
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge
  Distillation
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation
Liyong Guo
Xiaoyu Yang
Quandong Wang
Yuxiang Kong
Zengwei Yao
...
Wei Kang
Long Lin
Mingshuang Luo
Piotr Żelasko
Daniel Povey
VLM
93
7
0
31 Oct 2022
Structured State Space Decoder for Speech Recognition and Synthesis
Structured State Space Decoder for Speech Recognition and Synthesis
Koichi Miyazaki
Masato Murata
Tomoki Koriyama
104
13
0
31 Oct 2022
Wespeaker: A Research and Production oriented Speaker Embedding Learning
  Toolkit
Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit
Hongji Wang
Che-Yuan Liang
Shuai Wang
Zhengyang Chen
Binbin Zhang
Xu Xiang
Yan Deng
Y. Qian
119
128
0
31 Oct 2022
WeKws: A production first small-footprint end-to-end Keyword Spotting
  Toolkit
WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit
Jie Wang
Menglong Xu
Jingyong Hou
Binbin Zhang
Xiao-Lei Zhang
Linfu Xie
Fuping Pan
48
11
0
30 Oct 2022
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with
  Pre-trained Masked Language Model
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model
Yosuke Higuchi
Brian Yan
Siddhant Arora
Tetsuji Ogawa
Tetsunori Kobayashi
Shinji Watanabe
120
26
0
29 Oct 2022
Speaker Representation Learning via Contrastive Loss with Maximal
  Speaker Separability
Speaker Representation Learning via Contrastive Loss with Maximal Speaker Separability
Zhe Li
Man-Wai Mak
SSL
117
6
0
29 Oct 2022
Discriminative Speaker Representation via Contrastive Learning with
  Class-Aware Attention in Angular Space
Discriminative Speaker Representation via Contrastive Learning with Class-Aware Attention in Angular Space
Zhe Li
Man-Wai Mak
Helen M. Meng
90
9
0
29 Oct 2022
End-to-end Spoken Language Understanding with Tree-constrained Pointer
  Generator
End-to-end Spoken Language Understanding with Tree-constrained Pointer Generator
Guangzhi Sun
Chuxu Zhang
P. Woodland
69
8
0
29 Oct 2022
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Xubo Liu
Qiushi Huang
Xinhao Mei
Haohe Liu
Qiuqiang Kong
...
Yu Zhang
Lilian H. Y. Tang
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
162
20
0
28 Oct 2022
Efficient Speech Translation with Dynamic Latent Perceivers
Efficient Speech Translation with Dynamic Latent Perceivers
Ioannis Tsiamas
Gerard I. Gállego
José A. R. Fonollosa
Marta R. Costa-jussá
59
3
0
28 Oct 2022
Filter and evolve: progressive pseudo label refining for semi-supervised
  automatic speech recognition
Filter and evolve: progressive pseudo label refining for semi-supervised automatic speech recognition
Zezhong Jin
Dading Zhong
Xiao Song
Zhaoyi Liu
Naipeng Ye
Qingcheng Zeng
62
2
0
28 Oct 2022
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for
  Multimodal Sentiment Analysis
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Atsushi Ando
Ryo Masumura
Akihiko Takashima
Satoshi Suzuki
Naoki Makishima
Keita Suzuki
Takafumi Moriya
Takanori Ashihara
Hiroshi Sato
98
9
0
28 Oct 2022
Make More of Your Data: Minimal Effort Data Augmentation for Automatic
  Speech Recognition and Translation
Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation
Tsz Kin Lam
Shigehiko Schamoni
Stefan Riezler
VLM
86
10
0
27 Oct 2022
A knowledge-driven vowel-based approach of depression classification
  from speech using data augmentation
A knowledge-driven vowel-based approach of depression classification from speech using data augmentation
Kexin Feng
Theodora Chaspari
46
7
0
27 Oct 2022
Contextual-Utterance Training for Automatic Speech Recognition
Contextual-Utterance Training for Automatic Speech Recognition
Alejandro Gomez-Alanis
Lukas Drude
A. Schwarz
Rupak Vignesh Swaminathan
Simon Wiesler
64
1
0
27 Oct 2022
Iterative pseudo-forced alignment by acoustic CTC loss for
  self-supervised ASR domain adaptation
Iterative pseudo-forced alignment by acoustic CTC loss for self-supervised ASR domain adaptation
F. López
Jordi Luque
44
6
0
27 Oct 2022
Training Autoregressive Speech Recognition Models with Limited in-domain
  Supervision
Training Autoregressive Speech Recognition Models with Limited in-domain Supervision
Chak-Fai Li
Francis Keith
William Hartmann
M. Snover
56
0
0
27 Oct 2022
End-to-End Speech to Intent Prediction to improve E-commerce Customer
  Support Voicebot in Hindi and English
End-to-End Speech to Intent Prediction to improve E-commerce Customer Support Voicebot in Hindi and English
Abhinav Goyal
Ashutosh Kumar Singh
Nikesh Garera
42
4
0
26 Oct 2022
Pretrained audio neural networks for Speech emotion recognition in
  Portuguese
Pretrained audio neural networks for Speech emotion recognition in Portuguese
M. Gauy
Marcelo Finger
39
4
0
26 Oct 2022
TSUP Speaker Diarization System for Conversational Short-phrase Speaker
  Diarization Challenge
TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge
Bowen Pang
Huan Zhao
Gaosheng Zhang
Xiaoyue Yang
Yanguo Sun
Li Zhang
Qing Wang
Linfu Xie
BDL
54
2
0
26 Oct 2022
Reducing Language confusion for Code-switching Speech Recognition with
  Token-level Language Diarization
Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization
Hexin Liu
Haihua Xu
Leibny Paola García
Andy W. H. Khong
Yi He
Sanjeev Khudanpur
63
25
0
26 Oct 2022
UFO2: A unified pre-training framework for online and offline speech
  recognition
UFO2: A unified pre-training framework for online and offline speech recognition
Li Fu
Siqi Li
Qingtao Li
L. Deng
Fangzhu Li
Lu Fan
Meng Chen
Xiaodong He
OffRL
129
8
0
26 Oct 2022
Development of Hybrid ASR Systems for Low Resource Medical Domain
  Conversational Telephone Speech
Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech
Christoph Luscher
Mohammad Zeineldeen
Zijian Yang
Tina Raissi
Peter Vieting
Khai-Nguyen Nguyen
Weiyue Wang
Ralf Schluter
Hermann Ney
37
5
0
24 Oct 2022
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
Sanchit Gandhi
Patrick von Platen
Alexander M. Rush
70
25
0
24 Oct 2022
10 hours data is all you need
10 hours data is all you need
Zeping Min
Qian Ge
Zhong Li
104
2
0
24 Oct 2022
Quantitative Evidence on Overlooked Aspects of Enrollment Speaker
  Embeddings for Target Speaker Separation
Quantitative Evidence on Overlooked Aspects of Enrollment Speaker Embeddings for Target Speaker Separation
Xiaoyu Liu
Xu Li
Joan Serrà
87
9
0
23 Oct 2022
BEANS: The Benchmark of Animal Sounds
BEANS: The Benchmark of Animal Sounds
Masato Hagiwara
Benjamin Hoffman
Jen-Yu Liu
M. Cusimano
Felix Effenberger
Katie Zacarian
97
27
0
21 Oct 2022
G-Augment: Searching for the Meta-Structure of Data Augmentation
  Policies for ASR
G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR
Gary Wang
Ekin D.Cubuk
Andrew Rosenberg
Shuyang Cheng
Ron J. Weiss
Bhuvana Ramabhadran
Pedro J. Moreno
Quoc V. Le
Daniel S. Park
117
2
0
19 Oct 2022
Optimizing Temporal Resolution Of Convolutional Recurrent Neural
  Networks For Sound Event Detection
Optimizing Temporal Resolution Of Convolutional Recurrent Neural Networks For Sound Event Detection
Wim Boes
Hugo Van hamme
32
1
0
18 Oct 2022
Maestro-U: Leveraging joint speech-text representation learning for zero
  supervised speech ASR
Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Zhehuai Chen
Ankur Bapna
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Pedro J. Moreno
Nanxin Chen
107
17
0
18 Oct 2022
SVLDL: Improved Speaker Age Estimation Using Selective Variance Label
  Distribution Learning
SVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution Learning
Zuheng Kang
Jianzong Wang
Junqing Peng
Jing Xiao
85
3
0
18 Oct 2022
Sub-8-bit quantization for on-device speech recognition: a
  regularization-free approach
Sub-8-bit quantization for on-device speech recognition: a regularization-free approach
Kai Zhen
Martin H. Radfar
Hieu Duy Nguyen
Grant P. Strimel
Nathan Susanj
Athanasios Mouchtaris
MQ
69
8
0
17 Oct 2022
Continuous Pseudo-Labeling from the Start
Continuous Pseudo-Labeling from the Start
Dan Berrebbi
R. Collobert
Samy Bengio
Navdeep Jaitly
Tatiana Likhomanenko
67
16
0
17 Oct 2022
Acoustic-aware Non-autoregressive Spell Correction with Mask Sample
  Decoding
Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Ruchao Fan
Guoli Ye
Yashesh Gaur
Jinyu Li
40
4
0
16 Oct 2022
A Policy-based Approach to the SpecAugment Method for Low Resource E2E
  ASR
A Policy-based Approach to the SpecAugment Method for Low Resource E2E ASR
Rui Li
Guodong Ma
Dexin Zhao
Ranran Zeng
Xiaoyu Li
Haolin Huang
69
2
0
16 Oct 2022
Learning to Jointly Transcribe and Subtitle for End-to-End Spontaneous
  Speech Recognition
Learning to Jointly Transcribe and Subtitle for End-to-End Spontaneous Speech Recognition
Jakob Poncelet
Hugo Van hamme
43
2
0
14 Oct 2022
LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge
Yan Jia
Mihee Hong
Jingyu Hou
Kailong Ren
Sifan Ma
Jin Wang
Fangzhen Peng
Yinglin Ji
Lin Yang
Junjie Wang
56
1
0
14 Oct 2022
JOIST: A Joint Speech and Text Streaming Model For ASR
JOIST: A Joint Speech and Text Streaming Model For ASR
Tara N. Sainath
Rohit Prabhavalkar
Ankur Bapna
Yu Zhang
Zhouyuan Huo
Zhehuai Chen
Yue Liu
Weiran Wang
Trevor Strohman
RALMAuLLM
91
35
0
13 Oct 2022
Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score
  Fusion
Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score Fusion
Yuxiang Zhang
Jingze Lu
Xingming Wang
Zhuo Li
Runqiu Xiao
Wenchao Wang
Ming Li
Pengyuan Zhang
72
5
0
13 Oct 2022
Foundation Transformers
Foundation Transformers
Hongyu Wang
Shuming Ma
Shaohan Huang
Li Dong
Wenhui Wang
...
Barun Patra
Zhun Liu
Vishrav Chaudhary
Xia Song
Furu Wei
AI4CE
91
27
0
12 Oct 2022
An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling
  to Differential Privacy Preserving Speech Recognition
An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition
Chao-Han Huck Yang
Jun Qi
Sabato Marco Siniscalchi
Chin-Hui Lee
86
4
0
12 Oct 2022
Cross-dataset COVID-19 Transfer Learning with Cough Detection, Cough
  Segmentation, and Data Augmentation
Cross-dataset COVID-19 Transfer Learning with Cough Detection, Cough Segmentation, and Data Augmentation
Bagus Tris Atmaja
Zanjabila
Suyanto
A. Sasou
72
1
0
12 Oct 2022
Comparison of Soft and Hard Target RNN-T Distillation for Large-scale
  ASR
Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR
DongSeon Hwang
K. Sim
Yu Zhang
Trevor Strohman
69
11
0
11 Oct 2022
Scaling Up Deliberation for Multilingual ASR
Scaling Up Deliberation for Multilingual ASR
Ke Hu
Yue Liu
Tara N. Sainath
LRM
88
9
0
11 Oct 2022
Automated Audio Captioning via Fusion of Low- and High- Dimensional
  Features
Automated Audio Captioning via Fusion of Low- and High- Dimensional Features
Jianyuan Sun
Xubo Liu
Xinhao Mei
Mark D. Plumbley
V. Kılıç
Wenwu Wang
80
3
0
10 Oct 2022
Previous
123...567...192021
Next