ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXiv (abs)PDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 1,048 papers shown
Title
Slow-Fast Auditory Streams For Audio Recognition
Slow-Fast Auditory Streams For Audio Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
117
68
0
05 Mar 2021
Neural model robustness for skill routing in large-scale conversational
  AI systems: A design choice exploration
Neural model robustness for skill routing in large-scale conversational AI systems: A design choice exploration
Han Li
Sunghyun Park
Aswarth Abhilash Dara
Jinseok Nam
Sungjin Lee
Young-Bum Kim
Spyros Matsoukas
R. Sarikaya
67
9
0
04 Mar 2021
An Empirical Study of End-to-end Simultaneous Speech Translation
  Decoding Strategies
An Empirical Study of End-to-end Simultaneous Speech Translation Decoding Strategies
H. Nguyen
Yannick Esteve
Laurent Besacier
60
19
0
04 Mar 2021
Perceiver: General Perception with Iterative Attention
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLMViTMDE
214
1,030
0
04 Mar 2021
Alignment Knowledge Distillation for Online Streaming Attention-based
  Speech Recognition
Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition
Hirofumi Inaguma
Tatsuya Kawahara
127
14
0
28 Feb 2021
The NPU System for the 2020 Personalized Voice Trigger Challenge
The NPU System for the 2020 Personalized Voice Trigger Challenge
Jingyong Hou
Li Zhang
Yihui Fu
Qing Wang
Zhanheng Yang
Qijie Shao
Lei Xie
61
7
0
26 Feb 2021
MixSpeech: Data Augmentation for Low-resource Automatic Speech
  Recognition
MixSpeech: Data Augmentation for Low-resource Automatic Speech Recognition
Linghui Meng
Jin Xu
Xu Tan
Jindong Wang
Tao Qin
Bo Xu
VLM
115
78
0
25 Feb 2021
The Accented English Speech Recognition Challenge 2020: Open Datasets,
  Tracks, Baselines, Results and Methods
The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods
Xian Shi
Fan Yu
Yizhou Lu
Yuhao Liang
Qiangze Feng
Daliang Wang
Y. Qian
Lei Xie
60
68
0
20 Feb 2021
End-to-End Neural Systems for Automatic Children Speech Recognition: An
  Empirical Study
End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study
Prashanth Gurunath Shivakumar
Shrikanth Narayanan
53
54
0
19 Feb 2021
Unit selection synthesis based data augmentation for fixed phrase
  speaker verification
Unit selection synthesis based data augmentation for fixed phrase speaker verification
Houjun Huang
Xu Xiang
Fei Zhao
Shuai Wang
Y. Qian
16
6
0
19 Feb 2021
Fundamental Frequency Feature Normalization and Data Augmentation for
  Child Speech Recognition
Fundamental Frequency Feature Normalization and Data Augmentation for Child Speech Recognition
Gary Yeung
Ruchao Fan
Abeer Alwan
74
20
0
18 Feb 2021
End-to-end lyrics Recognition with Voice to Singing Style Transfer
End-to-end lyrics Recognition with Voice to Singing Style Transfer
Sakya Basak
Shrutina Agarwal
Sriram Ganapathy
Naoya Takahashi
68
20
0
17 Feb 2021
End-to-End Automatic Speech Recognition with Deep Mutual Learning
End-to-End Automatic Speech Recognition with Deep Mutual Learning
Ryo Masumura
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Takanori Ashihara
34
5
0
16 Feb 2021
Hierarchical Transformer-based Large-Context End-to-end ASR with
  Large-Context Knowledge Distillation
Hierarchical Transformer-based Large-Context End-to-end ASR with Large-Context Knowledge Distillation
Ryo Masumura
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Shota Orihashi
77
29
0
16 Feb 2021
Adversarial defense for automatic speaker verification by cascaded
  self-supervised learning models
Adversarial defense for automatic speaker verification by cascaded self-supervised learning models
Haibin Wu
Xu Li
Andy T. Liu
Zhiyong Wu
Helen Meng
Hung-yi Lee
AAML
86
41
0
14 Feb 2021
End-to-end Audio-visual Speech Recognition with Conformers
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
Maja Pantic
157
234
0
12 Feb 2021
Enhancing Audio Augmentation Methods with Consistency Learning
Enhancing Audio Augmentation Methods with Consistency Learning
Turab Iqbal
Karim Helwani
A. Krishnaswamy
Wenwu Wang
67
5
0
09 Feb 2021
Intermediate Loss Regularization for CTC-based Speech Recognition
Intermediate Loss Regularization for CTC-based Speech Recognition
Jaesong Lee
Shinji Watanabe
151
140
0
05 Feb 2021
Data Generation Using Pass-phrase-dependent Deep Auto-encoders for
  Text-Dependent Speaker Verification
Data Generation Using Pass-phrase-dependent Deep Auto-encoders for Text-Dependent Speaker Verification
A. K. Sarkar
Md. Sahidullah
Zheng-Hua Tan
26
0
0
03 Feb 2021
Speech Emotion Recognition with Multiscale Area Attention and Data
  Augmentation
Speech Emotion Recognition with Multiscale Area Attention and Data Augmentation
Mingke Xu
Fan Zhang
Xiaodong Cui
Wei Zhang
54
52
0
03 Feb 2021
The Multilingual TEDx Corpus for Speech Recognition and Translation
The Multilingual TEDx Corpus for Speech Recognition and Translation
Elizabeth Salesky
Sanjeev Khudanpur
Jacob Bremerman
R. Cattoni
Matteo Negri
Marco Turchi
Douglas W. Oard
Matt Post
79
126
0
02 Feb 2021
CTC-based Compression for Direct Speech Translation
CTC-based Compression for Direct Speech Translation
Marco Gaido
Mauro Cettolo
Matteo Negri
Marco Turchi
104
59
0
02 Feb 2021
WeNet: Production oriented Streaming and Non-streaming End-to-End Speech
  Recognition Toolkit
WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit
Zhuoyuan Yao
Di Wu
Xiong Wang
Binbin Zhang
Fan Yu
Chao Yang
Zhendong Peng
Xiaoyu Chen
Lei Xie
X. Lei
129
268
0
02 Feb 2021
The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural
  Diarization and X-Vector Clustering Systems Combined by DOVER-Lap
The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap
Shota Horiguchi
Nelson Yalta
Leibny Paola García-Perera
Yuki Takashima
Yawen Xue
Desh Raj
Zili Huang
Yusuke Fujita
Shinji Watanabe
Sanjeev Khudanpur
BDL
63
37
0
02 Feb 2021
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and
  Aggregation
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation
Yuan Gong
Yu-An Chung
James R. Glass
VLM
199
147
0
02 Feb 2021
BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge
BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge
M. Kocour
Guillermo Cámbara
Jordi Luque
David Bonet
Mireia Farrús
Martin Karafiát
Karel Veselý
Jan ''Honza'' Cernocký
28
6
0
29 Jan 2021
LEAF: A Learnable Frontend for Audio Classification
LEAF: A Learnable Frontend for Audio Classification
Neil Zeghidour
O. Teboul
Félix de Chaumont Quitry
Marco Tagliasacchi
VLMAAML
137
148
0
21 Jan 2021
Arabic Speech Recognition by End-to-End, Modular Systems and Human
Arabic Speech Recognition by End-to-End, Modular Systems and Human
A. Hussein
Shinji Watanabe
Ahmed M. Ali
VLM
72
50
0
21 Jan 2021
On Data-Augmentation and Consistency-Based Semi-Supervised Learning
On Data-Augmentation and Consistency-Based Semi-Supervised Learning
Atin Ghosh
Alexandre Hoang Thiery
132
21
0
18 Jan 2021
Tiny Transducer: A Highly-efficient Speech Recognition Model on Edge
  Devices
Tiny Transducer: A Highly-efficient Speech Recognition Model on Edge Devices
Yuekai Zhang
Sining Sun
Long Ma
95
29
0
18 Jan 2021
An evaluation of word-level confidence estimation for end-to-end
  automatic speech recognition
An evaluation of word-level confidence estimation for end-to-end automatic speech recognition
Dan Oneaţă
Alexandru Caranica
Adriana Stan
H. Cucu
UQCV
88
25
0
14 Jan 2021
End-to-End Speaker Height and age estimation using Attention Mechanism
  with LSTM-RNN
End-to-End Speaker Height and age estimation using Attention Mechanism with LSTM-RNN
Manav Kaushik
Van Tung Pham
Chng Eng Siong
53
6
0
13 Jan 2021
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based
  Acoustic Modeling for Sound Event Localization and Detection
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection
Qing Wang
Jun Du
Hua-Xin Wu
Jia Pan
Feng Ma
Chin-Hui Lee
64
83
0
08 Jan 2021
Environment Transfer for Distributed Systems
Environment Transfer for Distributed Systems
Chunheng Jiang
Jae-wook Ahn
N. Desai
60
1
0
06 Jan 2021
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks
Hieu H. Pham
Quoc V. Le
128
57
0
05 Jan 2021
Robustness Testing of Language Understanding in Task-Oriented Dialog
Robustness Testing of Language Understanding in Task-Oriented Dialog
Jiexi Liu
Ryuichi Takanobu
Jiaxin Wen
Dazhen Wan
Hongguang Li
Weiran Nie
Cheng Li
Wei Peng
Minlie Huang
ELM
122
49
0
30 Dec 2020
NeurST: Neural Speech Translation Toolkit
NeurST: Neural Speech Translation Toolkit
Chengqi Zhao
Mingxuan Wang
Qianqian Dong
Rong Ye
Lei Li
89
32
0
18 Dec 2020
CIF-based Collaborative Decoding for End-to-end Contextual Speech
  Recognition
CIF-based Collaborative Decoding for End-to-end Contextual Speech Recognition
Minglun Han
Linhao Dong
Shiyu Zhou
Bo Xu
73
23
0
17 Dec 2020
A review of on-device fully neural end-to-end automatic speech
  recognition algorithms
A review of on-device fully neural end-to-end automatic speech recognition algorithms
Chanwoo Kim
Dhananjaya N. Gowda
Dongsoo Lee
Jiyeon Kim
Ankur Kumar
Sungsoo Kim
Abhinav Garg
C. Han
68
27
0
14 Dec 2020
Bayesian Learning for Deep Neural Network Adaptation
Bayesian Learning for Deep Neural Network Adaptation
Xurong Xie
Xunying Liu
Tan Lee
Lan Wang
BDL
112
22
0
14 Dec 2020
REDAT: Accent-Invariant Representation for End-to-End ASR by Domain
  Adversarial Training with Relabeling
REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling
Hu Hu
Xuesong Yang
Zeynab Raeesy
Jinxi Guo
Gokce Keskin
Harish Arsikere
Ariya Rastrow
A. Stolcke
Roland Maas
64
30
0
14 Dec 2020
Self-supervised Text-independent Speaker Verification using Prototypical
  Momentum Contrastive Learning
Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning
Wei Xia
Chunlei Zhang
Chao Weng
Meng Yu
Dong Yu
SSL
64
80
0
13 Dec 2020
Less Is More: Improved RNN-T Decoding Using Limited Label Context and
  Path Merging
Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging
Rohit Prabhavalkar
Yanzhang He
David Rybach
S. Campbell
A. Narayanan
Trevor Strohman
Tara N. Sainath
125
35
0
12 Dec 2020
Improved Robustness to Disfluencies in RNN-Transducer Based Speech
  Recognition
Improved Robustness to Disfluencies in RNN-Transducer Based Speech Recognition
Valentin Mendelev
Tina Raissi
Guglielmo Camporese
Manuel Giollo
48
21
0
11 Dec 2020
Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech
  Recognition
Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition
Binbin Zhang
Di Wu
Zhuoyuan Yao
Xiong Wang
F. Yu
Chao Yang
Liyong Guo
Yaguang Hu
Lei Xie
X. Lei
93
81
0
10 Dec 2020
Parameter Efficient Multimodal Transformers for Video Representation
  Learning
Parameter Efficient Multimodal Transformers for Video Representation Learning
Sangho Lee
Youngjae Yu
Gunhee Kim
Thomas Breuel
Jan Kautz
Yale Song
ViT
104
78
0
08 Dec 2020
Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid
  ASR Systems
Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems
Xinwei Li
Yuanyuan Zhang
Xiaodan Zhuang
Daben Liu
28
6
0
07 Dec 2020
MLS: A Large-Scale Multilingual Dataset for Speech Research
MLS: A Large-Scale Multilingual Dataset for Speech Research
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
AuLLM
189
513
0
07 Dec 2020
Triplet Entropy Loss: Improving The Generalisation of Short Speech
  Language Identification Systems
Triplet Entropy Loss: Improving The Generalisation of Short Speech Language Identification Systems
Ruan van der Merwe
71
8
0
03 Dec 2020
Improving accuracy of rare words for RNN-Transducer through unigram
  shallow fusion
Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion
Vijay Ravi
Yile Gu
Ankur Gandhe
Ariya Rastrow
Linda Liu
Denis Filimonov
Scott Novotney
I. Bulyko
60
9
0
30 Nov 2020
Previous
123...151617...192021
Next