ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXiv (abs)PDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 1,048 papers shown
Title
Transformer-Transducers for Code-Switched Speech Recognition
Transformer-Transducers for Code-Switched Speech Recognition
Siddharth Dalmia
Yuzong Liu
S. Ronanki
Katrin Kirchhoff
88
47
0
30 Nov 2020
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty
  Driven Self-Training
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training
Sameer Khurana
Niko Moritz
Takaaki Hori
Jonathan Le Roux
102
59
0
26 Nov 2020
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Yiling Huang
Yutian Chen
Jason W. Pelecanos
Quan Wang
100
12
0
24 Nov 2020
Streaming Multi-speaker ASR with RNN-T
Streaming Multi-speaker ASR with RNN-T
Ilya Sklyar
A. Piunova
Yulan Liu
80
37
0
23 Nov 2020
Improving RNN-T ASR Accuracy Using Context Audio
Improving RNN-T ASR Accuracy Using Context Audio
A. Schwarz
Ilya Sklyar
Simon Wiesler
83
9
0
20 Nov 2020
Deep Residual Local Feature Learning for Speech Emotion Recognition
Deep Residual Local Feature Learning for Speech Emotion Recognition
Sattaya Singkul
Thakorn Chatchaisathaporn
B. Suntisrivaraporn
K. Woraratpanya
38
4
0
19 Nov 2020
Predicting Rigid Body Dynamics using Dual Quaternion Recurrent Neural
  Networks with Quaternion Attention
Predicting Rigid Body Dynamics using Dual Quaternion Recurrent Neural Networks with Quaternion Attention
Johannes Pöppelbaum
Andreas Schwung
42
13
0
17 Nov 2020
Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin
  Speech Recognition with a Syllable-to-Character Converter
Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin Speech Recognition with a Syllable-to-Character Converter
Xiong Wang
Zhuoyuan Yao
Xian Shi
Lei Xie
68
30
0
17 Nov 2020
Unsupervised Contrastive Learning of Sound Event Representations
Unsupervised Contrastive Learning of Sound Event Representations
Eduardo Fonseca
Diego Ortego
Kevin McGuinness
Noel E. O'Connor
Xavier Serra
SSL
72
66
0
15 Nov 2020
Audio-Visual Event Recognition through the lens of Adversary
Audio-Visual Event Recognition through the lens of Adversary
Juncheng Li
Kaixin Ma
Shuhui Qu
Po-Yao (Bernie) Huang
Florian Metze
AAML
65
9
0
15 Nov 2020
The CUHK-TUDELFT System for The SLT 2021 Children Speech Recognition
  Challenge
The CUHK-TUDELFT System for The SLT 2021 Children Speech Recognition Challenge
Si-Ioi Ng
W. Liu
Zhiyuan Peng
Siyuan Feng
Hingpang Huang
O. Scharenborg
Tan Lee
3DV
41
8
0
12 Nov 2020
Efficient Knowledge Distillation for RNN-Transducer Models
Efficient Knowledge Distillation for RNN-Transducer Models
S. Panchapagesan
Daniel S. Park
Chung-Cheng Chiu
Yuan Shangguan
Qiao Liang
A. Gruenstein
73
54
0
11 Nov 2020
Low-resource expressive text-to-speech using data augmentation
Low-resource expressive text-to-speech using data augmentation
Goeric Huybrechts
Thomas Merritt
Giulia Comini
Bartek Perz
Raahil Shah
Jaime Lorenzo-Trueba
68
53
0
11 Nov 2020
Data Augmentation For Children's Speech Recognition -- The "Ethiopian"
  System For The SLT 2021 Children Speech Recognition Challenge
Data Augmentation For Children's Speech Recognition -- The "Ethiopian" System For The SLT 2021 Children Speech Recognition Challenge
Guoguo Chen
Xingyu Na
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Sifan Ma
Yujun Wang
60
22
0
09 Nov 2020
Gated Recurrent Fusion with Joint Training Framework for Robust
  End-to-End Speech Recognition
Gated Recurrent Fusion with Joint Training Framework for Robust End-to-End Speech Recognition
Cunhang Fan
Jiangyan Yi
J. Tao
Zhengkun Tian
Bin Liu
Zhengqi Wen
60
72
0
09 Nov 2020
Stochastic Attention Head Removal: A simple and effective method for
  improving Transformer Based ASR Models
Stochastic Attention Head Removal: A simple and effective method for improving Transformer Based ASR Models
Shucong Zhang
Erfan Loweimi
P. Bell
Steve Renals
35
0
0
08 Nov 2020
Improving RNN Transducer Based ASR with Auxiliary Tasks
Improving RNN Transducer Based ASR with Auxiliary Tasks
Chunxi Liu
Frank Zhang
Duc Le
Suyoun Kim
Yatharth Saraf
Geoffrey Zweig
89
49
0
05 Nov 2020
Alignment Restricted Streaming Recurrent Neural Network Transducer
Alignment Restricted Streaming Recurrent Neural Network Transducer
Jay Mahadeokar
Yuan Shangguan
Duc Le
Gil Keren
Hang Su
Thong Le
Ching-Feng Yeh
Christian Fuegen
M. Seltzer
AI4TS
75
66
0
05 Nov 2020
Data Augmentation for End-to-end Code-switching Speech Recognition
Data Augmentation for End-to-end Code-switching Speech Recognition
Chenpeng Du
Hao Li
Yizhou Lu
Lan Wang
Y. Qian
65
28
0
04 Nov 2020
Integration of speech separation, diarization, and recognition for
  multi-speaker meetings: System description, comparison, and analysis
Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis
Desh Raj
Pavel Denisov
Zhuo Chen
Hakan Erdogan
Zili Huang
...
Yi Luo
Naoyuki Kanda
Jinyu Li
Scott Wisdom
J. Hershey
66
88
0
03 Nov 2020
A Two-Stage Approach to Device-Robust Acoustic Scene Classification
A Two-Stage Approach to Device-Robust Acoustic Scene Classification
Hu Hu
Chao-Han Huck Yang
Xianjun Xia
Xue Bai
Xin Tang
...
Yuanjun Zhao
Sabato Marco Siniscalchi
Yannan Wang
Jun Du
Chin-Hui Lee
66
31
0
03 Nov 2020
SapAugment: Learning A Sample Adaptive Policy for Data Augmentation
SapAugment: Learning A Sample Adaptive Policy for Data Augmentation
Ting-Yao Hu
A. Shrivastava
Jen-Hao Rick Chang
H. Koppula
Stefan Braun
Kyuyeon Hwang
Ozlem Kalinli
Oncel Tuzel
84
17
0
02 Nov 2020
Dual-decoder Transformer for Joint Automatic Speech Recognition and
  Multilingual Speech Translation
Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation
Hang Le
J. Pino
Changhan Wang
Jiatao Gu
D. Schwab
Laurent Besacier
115
83
0
02 Nov 2020
The xx205 System for the VoxCeleb Speaker Recognition Challenge 2020
The xx205 System for the VoxCeleb Speaker Recognition Challenge 2020
Xu Xiang
49
14
0
31 Oct 2020
Joint Masked CPC and CTC Training for ASR
Joint Masked CPC and CTC Training for ASR
Chaitanya Talnikar
Tatiana Likhomanenko
R. Collobert
Gabriel Synnaeve
SSL
110
27
0
30 Oct 2020
Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition
Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition
Wei Zhou
Simon Berger
Ralf Schluter
Hermann Ney
120
33
0
30 Oct 2020
Training Speech Recognition Models with Federated Learning: A
  Quality/Cost Framework
Training Speech Recognition Models with Federated Learning: A Quality/Cost Framework
Dhruv Guliani
F. Beaufays
Giovanni Motta
FedML
63
85
0
29 Oct 2020
ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation
  for Sound Event Localization and Detection
ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection
Kazuki Shimada
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
112
89
0
29 Oct 2020
Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input
Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input
Xingcheng Song
Zhiyong Wu
Yiheng Huang
Chao Weng
Jane Polak Scowcroft
Helen Meng
82
36
0
28 Oct 2020
Bridging the Modality Gap for Speech-to-Text Translation
Bridging the Modality Gap for Speech-to-Text Translation
Yuchen Liu
Junnan Zhu
Jiajun Zhang
Chengqing Zong
77
69
0
28 Oct 2020
Decoupling Pronunciation and Language for End-to-end Code-switching
  Automatic Speech Recognition
Decoupling Pronunciation and Language for End-to-end Code-switching Automatic Speech Recognition
Shuai Zhang
Jiangyan Yi
Zhengkun Tian
Ye Bai
J. Tao
Zhengqi Wen
38
14
0
28 Oct 2020
Cascaded encoders for unifying streaming and non-streaming ASR
Cascaded encoders for unifying streaming and non-streaming ASR
A. Narayanan
Tara N. Sainath
Ruoming Pang
Jiahui Yu
Chung-Cheng Chiu
Rohit Prabhavalkar
Ehsan Variani
Trevor Strohman
AuLLM
128
86
0
27 Oct 2020
Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single
  Encoder-Decoder Model
Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model
Zhifu Gao
Shiliang Zhang
Ming Lei
Ian Mcloughlin
CVBM
54
15
0
27 Oct 2020
Speech SIMCLR: Combining Contrastive and Reconstruction Objective for
  Self-supervised Speech Representation Learning
Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning
Dongwei Jiang
Wubo Li
Miao Cao
Wei Zou
Xiangang Li
SSL
91
65
0
27 Oct 2020
Recent Developments on ESPnet Toolkit Boosted by Conformer
Recent Developments on ESPnet Toolkit Boosted by Conformer
Pengcheng Guo
Florian Boyer
Xuankai Chang
Tomoki Hayashi
Yosuke Higuchi
...
Jing Shi
Shinji Watanabe
Kun Wei
Wangyou Zhang
Yuekai Zhang
89
263
0
26 Oct 2020
MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network
  for Voice Activity Detection
MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection
Fei Jia
Somshubra Majumdar
Boris Ginsburg
88
51
0
26 Oct 2020
Improved Neural Language Model Fusion for Streaming Recurrent Neural
  Network Transducer
Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer
Suyoun Kim
Shangguan Yuan
Jay Mahadeokar
A. Bruguier
Christian Fuegen
M. Seltzer
Duc Le
71
29
0
26 Oct 2020
Semi-Supervised Spoken Language Understanding via Self-Supervised Speech
  and Language Model Pretraining
Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining
Cheng-I Jeff Lai
Yung-Sung Chuang
Hung-yi Lee
Shang-Wen Li
James R. Glass
VLMSSL
96
60
0
26 Oct 2020
Improved Mask-CTC for Non-Autoregressive End-to-End ASR
Improved Mask-CTC for Non-Autoregressive End-to-End ASR
Yosuke Higuchi
Hirofumi Inaguma
Shinji Watanabe
Tetsuji Ogawa
Tetsunori Kobayashi
85
61
0
26 Oct 2020
Two-stage Textual Knowledge Distillation for End-to-End Spoken Language
  Understanding
Two-stage Textual Knowledge Distillation for End-to-End Spoken Language Understanding
Seongbin Kim
Gyuwan Kim
Seongjin Shin
Sangmin Lee
VLM
62
20
0
25 Oct 2020
An Improved Event-Independent Network for Polyphonic Sound Event
  Localization and Detection
An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection
Yin Cao
Turab Iqbal
Qiuqiang Kong
Y. Zhong
Wenwu Wang
Mark D. Plumbley
71
78
0
25 Oct 2020
Orthros: Non-autoregressive End-to-end Speech Translation with
  Dual-decoder
Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder
Hirofumi Inaguma
Yosuke Higuchi
Kevin Duh
Tatsuya Kawahara
Shinji Watanabe
66
22
0
25 Oct 2020
Align-Refine: Non-Autoregressive Speech Recognition via Iterative
  Realignment
Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment
Ethan A. Chi
Julian Salazar
Katrin Kirchhoff
AI4TS
88
52
0
24 Oct 2020
Transformer-based End-to-End Speech Recognition with Local Dense
  Synthesizer Attention
Transformer-based End-to-End Speech Recognition with Local Dense Synthesizer Attention
Menglong Xu
Shengqiang Li
Xiao-Lei Zhang
84
32
0
23 Oct 2020
Neural Audio Fingerprint for High-specific Audio Retrieval based on
  Contrastive Learning
Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning
Sungkyun Chang
Donmoon Lee
Jeongsoon Park
Hyungui Lim
Kyogu Lee
Karam Ko
Yoonchang Han
103
35
0
22 Oct 2020
Urban Sound Classification : striving towards a fair comparison
Urban Sound Classification : striving towards a fair comparison
Augustin Arnault
Baptiste Hanssens
Nicolas Riche
53
9
0
22 Oct 2020
Rethinking Evaluation in ASR: Are Our Models Robust Enough?
Rethinking Evaluation in ASR: Are Our Models Robust Enough?
Tatiana Likhomanenko
Qiantong Xu
Vineel Pratap
Paden Tomasello
Jacob Kahn
Gilad Avidov
R. Collobert
Gabriel Synnaeve
158
99
0
22 Oct 2020
SlimIPL: Language-Model-Free Iterative Pseudo-Labeling
SlimIPL: Language-Model-Free Iterative Pseudo-Labeling
Tatiana Likhomanenko
Qiantong Xu
Jacob Kahn
Gabriel Synnaeve
R. Collobert
VLM
136
65
0
22 Oct 2020
Similarity Analysis of Self-Supervised Speech Representations
Similarity Analysis of Self-Supervised Speech Representations
Yu-An Chung
Yonatan Belinkov
James R. Glass
SSL
122
37
0
22 Oct 2020
Self-training and Pre-training are Complementary for Speech Recognition
Self-training and Pre-training are Complementary for Speech Recognition
Qiantong Xu
Alexei Baevski
Tatiana Likhomanenko
Paden Tomasello
Alexis Conneau
R. Collobert
Gabriel Synnaeve
Michael Auli
SSLVLM
153
173
0
22 Oct 2020
Previous
123...161718192021
Next