ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXiv (abs)PDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 1,049 papers shown
Title
Investigation of Data Augmentation Techniques for Disordered Speech
  Recognition
Investigation of Data Augmentation Techniques for Disordered Speech Recognition
Mengzhe Geng
Xurong Xie
Shansong Liu
Jianwei Yu
Shoukang Hu
Xunying Liu
Helen Meng
63
59
0
14 Jan 2022
An Ensemble of Deep Learning Frameworks Applied For Predicting
  Respiratory Anomalies
An Ensemble of Deep Learning Frameworks Applied For Predicting Respiratory Anomalies
L. D. Pham
Dat Ngo
T. Hoang
Alexander Schindler
Ian Mcloughlin
74
5
0
09 Jan 2022
Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks
Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks
Shou-Yong Hu
Xurong Xie
Mingyu Cui
Jiajun Deng
Shansong Liu
Jianwei Yu
Mengzhe Geng
Xunying Liu
Helen Meng
99
27
0
08 Jan 2022
Automatic Speech Recognition Datasets in Cantonese: A Survey and New
  Dataset
Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset
Tiezheng Yu
Rita Frieske
Peng Xu
Samuel Cahyawijaya
Cheuk Tung Shadow Yiu
...
Elham J. Barezi
Qifeng Chen
Xiaojuan Ma
Bertram E. Shi
Pascale Fung
RALM
89
10
0
07 Jan 2022
iDECODe: In-distribution Equivariance for Conformal Out-of-distribution
  Detection
iDECODe: In-distribution Equivariance for Conformal Out-of-distribution Detection
R. Kaur
Susmit Jha
Anirban Roy
Sangdon Park
Yan Sun
O. Sokolsky
Insup Lee
OODD
66
47
0
07 Jan 2022
Improving Mandarin End-to-End Speech Recognition with Word N-gram
  Language Model
Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model
Jinchuan Tian
Jianwei Yu
Chao Weng
Yuexian Zou
Dong Yu
64
11
0
06 Jan 2022
Regularizing End-to-End Speech Translation with Triangular Decomposition
  Agreement
Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement
Yichao Du
Zhirui Zhang
Weizhi Wang
Boxing Chen
Jun Xie
Tong Xu
139
23
0
21 Dec 2021
Multi-turn RNN-T for streaming recognition of multi-party speech
Multi-turn RNN-T for streaming recognition of multi-party speech
Ilya Sklyar
A. Piunova
Xianrui Zheng
Yulan Liu
114
24
0
19 Dec 2021
Denoised Labels for Financial Time-Series Data via Self-Supervised
  Learning
Denoised Labels for Financial Time-Series Data via Self-Supervised Learning
Yanqing Ma
Carmine Ventre
M. Polukarov
NoLa
74
7
0
19 Dec 2021
Data Augmentation through Expert-guided Symmetry Detection to Improve
  Performance in Offline Reinforcement Learning
Data Augmentation through Expert-guided Symmetry Detection to Improve Performance in Offline Reinforcement Learning
Giorgio Angelotti
Nicolas Drougard
Caroline Ponzoni Carvalho Chanel
OffRL
78
2
0
18 Dec 2021
An Audio-Visual Dataset and Deep Learning Frameworks for Crowded Scene
  Classification
An Audio-Visual Dataset and Deep Learning Frameworks for Crowded Scene Classification
L. D. Pham
Dat Ngo
Phu X. Nguyen
Hoang Van Truong
Alexander Schindler
61
9
0
16 Dec 2021
Bootstrap Equilibrium and Probabilistic Speaker Representation Learning
  for Self-supervised Speaker Verification
Bootstrap Equilibrium and Probabilistic Speaker Representation Learning for Self-supervised Speaker Verification
Sung Hwan Mun
Min Hyun Han
Dongjune Lee
Jihwan Kim
N. Kim
SSL
96
3
0
16 Dec 2021
On the Use of External Data for Spoken Named Entity Recognition
On the Use of External Data for Spoken Named Entity Recognition
Ankita Pasad
Felix Wu
Suwon Shon
Karen Livescu
Kyu Jeong Han
95
16
0
14 Dec 2021
ImportantAug: a data augmentation agent for speech
ImportantAug: a data augmentation agent for speech
V. Trinh
Hassan Salami Kavaki
Michael I. Mandel
89
10
0
14 Dec 2021
PM-MMUT: Boosted Phone-Mask Data Augmentation using Multi-Modeling Unit
  Training for Phonetic-Reduction-Robust E2E Speech Recognition
PM-MMUT: Boosted Phone-Mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition
Guodong Ma
Pengfei Hu
Nurmemet Yolwas
Shen Huang
Hao-Ming Huang
90
4
0
13 Dec 2021
Improving Code-switching Language Modeling with Artificially Generated
  Texts using Cycle-consistent Adversarial Networks
Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks
Chia-Yu Li
Ngoc Thang Vu
57
12
0
12 Dec 2021
ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in
  Multi-turn Conversation
ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation
Holy Lovenia
Samuel Cahyawijaya
Genta Indra Winata
Peng Xu
Xu Yan
...
Elham J. Barezi
Qifeng Chen
Xiaojuan Ma
Bertram E. Shi
Pascale Fung
108
37
0
12 Dec 2021
Sequence-level self-learning with multiple hypotheses
Sequence-level self-learning with multiple hypotheses
K. Kumatani
Dimitrios Dimitriadis
Yashesh Gaur
R. Gmyr
Sefik Emre Eskimez
Jinyu Li
Michael Zeng
SSL
118
1
0
10 Dec 2021
Are E2E ASR models ready for an industrial usage?
Are E2E ASR models ready for an industrial usage?
Valentin Vielzeuf
G. Antipov
92
8
0
09 Dec 2021
Consistent Training and Decoding For End-to-end Speech Recognition Using
  Lattice-free MMI
Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI
Jinchuan Tian
Jianwei Yu
Chao Weng
Shi-Xiong Zhang
Jane Polak Scowcroft
Dong Yu
Yuexian Zou
AuLLM
77
13
0
05 Dec 2021
BBS-KWS:The Mandarin Keyword Spotting System Won the Video Keyword
  Wakeup Challenge
BBS-KWS:The Mandarin Keyword Spotting System Won the Video Keyword Wakeup Challenge
Yuting Yang
Binbin Du
Yingxin Zhang
Wenxuan Wang
Yuke Li
113
0
0
03 Dec 2021
Deliberation of Streaming RNN-Transducer by Non-autoregressive Decoding
Deliberation of Streaming RNN-Transducer by Non-autoregressive Decoding
Weiran Wang
Ke Hu
Tara N. Sainath
63
21
0
01 Dec 2021
Sound-Guided Semantic Image Manipulation
Sound-Guided Semantic Image Manipulation
Seung Hyun Lee
Wonseok Roh
Wonmin Byeon
Sang Ho Yoon
Chanyoung Kim
Jinkyu Kim
Sangpil Kim
DiffM
105
43
0
30 Nov 2021
Classification of animal sounds in a hyperdiverse rainforest using
  Convolutional Neural Networks
Classification of animal sounds in a hyperdiverse rainforest using Convolutional Neural Networks
Yuren Sun
Tatiana Midori Maeda
Claudia R. Solís-Lemus
Daniel L. Pimentel-Alarcón
Z. Buřivalová
38
21
0
29 Nov 2021
Linguistic Knowledge in Data Augmentation for Natural Language
  Processing: An Example on Chinese Question Matching
Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching
Zhengxiang Wang
46
2
0
29 Nov 2021
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
Siddhant Arora
Siddharth Dalmia
Pavel Denisov
Xuankai Chang
Yushi Ueda
...
Karthik Ganesan
Brian Yan
Ngoc Thang Vu
A. Black
Shinji Watanabe
VLM
83
75
0
29 Nov 2021
Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped
  Environments with Moving Sounds
Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments with Moving Sounds
Abdelrahman Younes
Daniel Honerkamp
Tim Welschehold
Abhinav Valada
106
42
0
29 Nov 2021
Romanian Speech Recognition Experiments from the ROBIN Project
Romanian Speech Recognition Experiments from the ROBIN Project
Andrei-Marius Avram
Vasile Puaics
Dan Tufics
56
4
0
23 Nov 2021
Deep Spoken Keyword Spotting: An Overview
Deep Spoken Keyword Spotting: An Overview
Iván López-Espejo
Zheng-Hua Tan
John H. L. Hansen
Jesper Jensen
87
107
0
20 Nov 2021
A comparison of streaming models and data augmentation methods for
  robust speech recognition
A comparison of streaming models and data augmentation methods for robust speech recognition
Jiyeon Kim
Mehul Kumar
Dhananjaya N. Gowda
Abhinav Garg
Chanwoo Kim
86
6
0
19 Nov 2021
Towards Measuring Fairness in Speech Recognition: Casual Conversations
  Dataset Transcriptions
Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions
Chunxi Liu
M. Picheny
Leda Sari
Pooja Chitkara
Alex Xiao
Xiaohui Zhang
Mark Chou
Andres Alvarado
C. Hazirbas
Yatharth Saraf
92
44
0
18 Nov 2021
SALSA-Lite: A Fast and Effective Feature for Polyphonic Sound Event
  Localization and Detection with Microphone Arrays
SALSA-Lite: A Fast and Effective Feature for Polyphonic Sound Event Localization and Detection with Microphone Arrays
Thi Ngoc Tho Nguyen
Douglas L. Jones
Karn N. Watcharasupat
Huy P Phan
W. Gan
81
37
0
16 Nov 2021
Attention based end to end Speech Recognition for Voice Search in Hindi
  and English
Attention based end to end Speech Recognition for Voice Search in Hindi and English
Raviraj Joshi
Venkateshan Kannan
51
7
0
15 Nov 2021
A Convolutional Neural Network Based Approach to Recognize Bangla Spoken
  Digits from Speech Signal
A Convolutional Neural Network Based Approach to Recognize Bangla Spoken Digits from Speech Signal
Ovishake Sen
Al-Mahmud
Pias Roy
44
6
0
12 Nov 2021
RawBoost: A Raw Data Boosting and Augmentation Method applied to
  Automatic Speaker Verification Anti-Spoofing
RawBoost: A Raw Data Boosting and Augmentation Method applied to Automatic Speaker Verification Anti-Spoofing
Hemlata Tak
Madhu R. Kamble
J. Patino
Massimiliano Todisco
Nicholas W. D. Evans
122
114
0
08 Nov 2021
Towards Building ASR Systems for the Next Billion Users
Towards Building ASR Systems for the Next Billion Users
Tahir Javed
Sumanth Doddapaneni
A. Raman
Kaushal Bhogale
Gowtham Ramesh
Anoop Kunchukuttan
Pratyush Kumar
Mitesh M. Khapra
84
55
0
06 Nov 2021
Conformer-based Hybrid ASR System for Switchboard Dataset
Conformer-based Hybrid ASR System for Switchboard Dataset
Mohammad Zeineldeen
Jingjing Xu
Christoph Luscher
Wilfried Michel
Alexander Gerstenberger
Ralf Schluter
Hermann Ney
72
25
0
05 Nov 2021
Voice Conversion Can Improve ASR in Very Low-Resource Settings
Voice Conversion Can Improve ASR in Very Low-Resource Settings
Matthew Baas
Herman Kamper
101
17
0
04 Nov 2021
STC speaker recognition systems for the NIST SRE 2021
STC speaker recognition systems for the NIST SRE 2021
Anastasia Avdeeva
Aleksei Gusev
Igor Korsunov
Alexander Kozlov
G. Lavrentyeva
...
Andrey Shulipa
Alisa Vinogradova
V. Volokhov
Evgeny Smirnov
Vasily Galyuk
68
15
0
03 Nov 2021
Recent Advances in End-to-End Automatic Speech Recognition
Recent Advances in End-to-End Automatic Speech Recognition
Jinyu Li
VLM
170
379
0
02 Nov 2021
Cross-lingual Transfer for Speech Processing using Acoustic Language
  Similarity
Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity
Peter Wu
Jiatong Shi
Yifan Zhong
Shinji Watanabe
A. Black
66
8
0
02 Nov 2021
Evaluating robustness of You Only Hear Once(YOHO) Algorithm on noisy
  audios in the VOICe Dataset
Evaluating robustness of You Only Hear Once(YOHO) Algorithm on noisy audios in the VOICe Dataset
Soham Dinesh Tiwari
Kshitiz Lakhotia
Manjunath Mulimani
25
2
0
01 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
90
15
0
01 Nov 2021
SNRi Target Training for Joint Speech Enhancement and Recognition
SNRi Target Training for Joint Speech Enhancement and Recognition
Yuma Koizumi
Shigeki Karita
A. Narayanan
S. Panchapagesan
M. Bacchiani
75
15
0
01 Nov 2021
Pseudo-Labeling for Massively Multilingual Speech Recognition
Pseudo-Labeling for Massively Multilingual Speech Recognition
Loren Lugosch
Tatiana Likhomanenko
Gabriel Synnaeve
R. Collobert
VLM
77
30
0
30 Oct 2021
Cross-attention conformer for context modeling in speech enhancement for
  ASR
Cross-attention conformer for context modeling in speech enhancement for ASR
A. Narayanan
Chung-Cheng Chiu
Tom O'Malley
Quan Wang
Yanzhang He
70
14
0
30 Oct 2021
Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial
  Attack Framework
Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial Attack Framework
Lifan Yuan
Yichi Zhang
Yangyi Chen
Wei Wei
AAML
124
34
0
28 Oct 2021
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on
  Real and Simulation Conditions
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions
Wangyou Zhang
Jing Shi
Chenda Li
Shinji Watanabe
Y. Qian
93
24
0
27 Oct 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
298
1,913
0
26 Oct 2021
Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition
Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition
Ting-Yao Hu
Mohammadreza Armandpour
A. Shrivastava
Jen-Hao Rick Chang
H. Koppula
Oncel Tuzel
SyDa
87
42
0
21 Oct 2021
Previous
123...101112...192021
Next