ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXivPDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 751 papers shown
Title
RawBoost: A Raw Data Boosting and Augmentation Method applied to
  Automatic Speaker Verification Anti-Spoofing
RawBoost: A Raw Data Boosting and Augmentation Method applied to Automatic Speaker Verification Anti-Spoofing
Hemlata Tak
Madhu R. Kamble
J. Patino
Massimiliano Todisco
Nicholas W. D. Evans
70
104
0
08 Nov 2021
Towards Building ASR Systems for the Next Billion Users
Towards Building ASR Systems for the Next Billion Users
Tahir Javed
Sumanth Doddapaneni
A. Raman
Kaushal Bhogale
Gowtham Ramesh
Anoop Kunchukuttan
Pratyush Kumar
Mitesh M. Khapra
44
54
0
06 Nov 2021
STC speaker recognition systems for the NIST SRE 2021
STC speaker recognition systems for the NIST SRE 2021
Anastasia Avdeeva
Aleksei Gusev
Igor Korsunov
Alexander Kozlov
G. Lavrentyeva
...
Andrey Shulipa
Alisa Vinogradova
V. Volokhov
Evgeny Smirnov
Vasily Galyuk
16
15
0
03 Nov 2021
Recent Advances in End-to-End Automatic Speech Recognition
Recent Advances in End-to-End Automatic Speech Recognition
Jinyu Li
VLM
43
363
0
02 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
35
14
0
01 Nov 2021
Cross-attention conformer for context modeling in speech enhancement for
  ASR
Cross-attention conformer for context modeling in speech enhancement for ASR
A. Narayanan
Chung-Cheng Chiu
Tom O'Malley
Quan Wang
Yanzhang He
32
14
0
30 Oct 2021
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on
  Real and Simulation Conditions
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions
Wangyou Zhang
Jing Shi
Chenda Li
Shinji Watanabe
Y. Qian
36
22
0
27 Oct 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
147
1,736
0
26 Oct 2021
Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition
Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition
Ting-Yao Hu
Mohammadreza Armandpour
A. Shrivastava
Jen-Hao Rick Chang
H. Koppula
Oncel Tuzel
SyDa
60
42
0
21 Oct 2021
RCT: Random Consistency Training for Semi-supervised Sound Event
  Detection
RCT: Random Consistency Training for Semi-supervised Sound Event Detection
Nian Shao
Erfan Loweimi
Xiaofei Li
29
13
0
21 Oct 2021
SSAST: Self-Supervised Audio Spectrogram Transformer
SSAST: Self-Supervised Audio Spectrogram Transformer
Yuan Gong
Cheng-I Jeff Lai
Yu-An Chung
James R. Glass
ViT
38
268
0
19 Oct 2021
Efficient Sequence Training of Attention Models using Approximative
  Recombination
Efficient Sequence Training of Attention Models using Approximative Recombination
Nils-Philipp Wynands
Wilfried Michel
Jan Rosendahl
Ralf Schluter
Hermann Ney
16
3
0
18 Oct 2021
Learning Models for Query by Vocal Percussion: A Comparative Study
Learning Models for Query by Vocal Percussion: A Comparative Study
Alejandro Delgado
SKoT McDonald
Ning Xu
C. Saitis
Mark Sandler
36
1
0
18 Oct 2021
DECAR: Deep Clustering for learning general-purpose Audio
  Representations
DECAR: Deep Clustering for learning general-purpose Audio Representations
Sreyan Ghosh
Sandesh V Katta
Ashish Seth
S. Umesh
SSL
36
12
0
17 Oct 2021
Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming
  E2E ASR via Supernet
Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet
Haichuan Yang
Yuan Shangguan
Dilin Wang
Meng Li
P. Chuang
Xiaohui Zhang
Ganesh Venkatesh
Ozlem Kalinli
Vikas Chandra
37
14
0
15 Oct 2021
Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Sangeeta Srivastava
Yun Wang
Andros Tjandra
Anurag Kumar
Chunxi Liu
Kritika Singh
Yatharth Saraf
SSL
38
24
0
14 Oct 2021
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same
  Class with Auxiliary Duplicating Permutation Invariant Training
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training
Kazuki Shimada
Yuichiro Koyama
Shusuke Takahashi
Naoya Takahashi
E. Tsunoo
Yuki Mitsufuji
13
63
0
14 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
284
1,026
0
13 Oct 2021
Study of positional encoding approaches for Audio Spectrogram
  Transformers
Study of positional encoding approaches for Audio Spectrogram Transformers
L. Pepino
Pablo Riera
Luciana Ferrer
ViT
28
6
0
13 Oct 2021
Decision Attentive Regularization to Improve Simultaneous Speech
  Translation Systems
Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems
Mohd Abbas Zaidi
Beomseok Lee
Sangha Kim
Chanwoo Kim
34
5
0
13 Oct 2021
Duality Temporal-channel-frequency Attention Enhanced Speaker
  Representation Learning
Duality Temporal-channel-frequency Attention Enhanced Speaker Representation Learning
Li Lyna Zhang
Qing Wang
Lei Xie
47
17
0
13 Oct 2021
Exploring Wav2vec 2.0 fine-tuning for improved speech emotion
  recognition
Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition
Li-Wei Chen
Alexander I. Rudnicky
VLM
33
122
0
12 Oct 2021
Multi-Modal Pre-Training for Automated Speech Recognition
Multi-Modal Pre-Training for Automated Speech Recognition
David M. Chan
Shalini Ghosh
D. Chakrabarty
Björn Hoffmeister
SSL
30
16
0
12 Oct 2021
Word Order Does Not Matter For Speech Recognition
Word Order Does Not Matter For Speech Recognition
Vineel Pratap
Qiantong Xu
Tatiana Likhomanenko
Gabriel Synnaeve
R. Collobert
43
4
0
12 Oct 2021
Investigation on Data Adaptation Techniques for Neural Named Entity
  Recognition
Investigation on Data Adaptation Techniques for Neural Named Entity Recognition
Evgeniia Tokarchuk
David Thulke
Weiyue Wang
Christian Dugast
Hermann Ney
21
2
0
12 Oct 2021
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition
Jing Pan
Tao Lei
Kwangyoun Kim
Kyu Jeong Han
Shinji Watanabe
VLM
34
9
0
11 Oct 2021
K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of
  Graphemes and Syllables
K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables
Jounghee Kim
Pilsung Kang
VLM
29
6
0
11 Oct 2021
Personalized Automatic Speech Recognition Trained on Small Disordered
  Speech Datasets
Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets
Jimmy Tobin
Katrin Tomanek
27
27
0
09 Oct 2021
Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR
Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR
Hanjing Zhu
Li Wang
Jindong Wang
Gaofeng Cheng
Pengyuan Zhang
Yonghong Yan
SSL
VLM
33
9
0
09 Oct 2021
TitaNet: Neural Model for speaker representation with 1D Depth-wise
  separable convolutions and global context
TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context
Nithin Rao Koluguri
Taejin Park
Boris Ginsburg
ViT
36
94
0
08 Oct 2021
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular
  Subword Units
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units
Yosuke Higuchi
Keita Karube
Tetsuji Ogawa
Tetsunori Kobayashi
18
23
0
08 Oct 2021
Neural Model Reprogramming with Similarity Based Mapping for
  Low-Resource Spoken Command Recognition
Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition
Hao Yen
Pin-Jui Ku
Chao-Han Huck Yang
Hu Hu
Sabato Marco Siniscalchi
Pin-Yu Chen
Yu Tsao
42
4
0
08 Oct 2021
Streaming Transformer Transducer Based Speech Recognition Using
  Non-Causal Convolution
Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution
Yangyang Shi
Chunyang Wu
Dilin Wang
Alex Xiao
Jay Mahadeokar
...
Ke Li
Yuan Shangguan
Varun K. Nagaraja
Ozlem Kalinli
M. Seltzer
36
15
0
07 Oct 2021
Enabling On-Device Training of Speech Recognition Models with Federated
  Dropout
Enabling On-Device Training of Speech Recognition Models with Federated Dropout
Dhruv Guliani
Lillian Zhou
Changwan Ryu
Tien-Ju Yang
Harry Zhang
Yong Xiao
F. Beaufays
Giovanni Motta
FedML
33
16
0
07 Oct 2021
Mandarin-English Code-switching Speech Recognition with Self-supervised
  Speech Representation Models
Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models
Liang-Hsuan Tseng
Yu-Kuan Fu
Heng-Jui Chang
Hung-yi Lee
SSL
28
14
0
07 Oct 2021
Back from the future: bidirectional CTC decoding using future
  information in speech recognition
Back from the future: bidirectional CTC decoding using future information in speech recognition
Namkyu Jung
Geon-min Kim
Han-Gyu Kim
35
3
0
07 Oct 2021
Integrating Categorical Features in End-to-End ASR
Integrating Categorical Features in End-to-End ASR
Rongqing Huang
26
1
0
06 Oct 2021
An Investigation of the Effectiveness of Phase for Audio Classification
An Investigation of the Effectiveness of Phase for Audio Classification
Shunsuke Hidaka
Kohei Wakamiya
T. Kaburagi
23
4
0
06 Oct 2021
Spell my name: keyword boosted speech recognition
Spell my name: keyword boosted speech recognition
Namkyu Jung
Geon-min Kim
Joon Son Chung
56
13
0
06 Oct 2021
Significance of Data Augmentation for Improving Cleft Lip and Palate
  Speech Recognition
Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition
Protima Nomo Sudro
Rohan Kumar Das
R. Sinha
S. M. I. S. R. Mahadeva Prasanna
24
5
0
02 Oct 2021
SpliceOut: A Simple and Efficient Audio Augmentation Method
SpliceOut: A Simple and Efficient Audio Augmentation Method
Arjit Jain
Pranay Reddy Samala
Deepak Mittal
Preethi Jyothi
M. Singh
35
10
0
30 Sep 2021
Fine-tuning wav2vec2 for speaker recognition
Fine-tuning wav2vec2 for speaker recognition
Nik Vaessen
David A. van Leeuwen
49
107
0
30 Sep 2021
FastCorrect 2: Fast Error Correction on Multiple Candidates for
  Automatic Speech Recognition
FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition
Yichong Leng
Xu Tan
Rui Wang
Linchen Zhu
Jin Xu
...
Linquan Liu
Tao Qin
Xiang-Yang Li
Ed Lin
Tie-Yan Liu
42
40
0
29 Sep 2021
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning
  for Automatic Speech Recognition
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Yu Zhang
Daniel S. Park
Wei Han
James Qin
Anmol Gulati
...
Zhifeng Chen
Quoc V. Le
Chung-Cheng Chiu
Ruoming Pang
Yonghui Wu
SSL
34
175
0
27 Sep 2021
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with
  Non-Autoregressive Hidden Intermediates
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates
Hirofumi Inaguma
Siddharth Dalmia
Brian Yan
Shinji Watanabe
65
11
0
27 Sep 2021
ChannelAugment: Improving generalization of multi-channel ASR by
  training with input channel randomization
ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization
M. Gaudesi
F. Weninger
D. Sharma
P. Zhan
AAML
35
1
0
23 Sep 2021
Multi-view Contrastive Self-Supervised Learning of Accounting Data
  Representations for Downstream Audit Tasks
Multi-view Contrastive Self-Supervised Learning of Accounting Data Representations for Downstream Audit Tasks
Marco Schreyer
Timur Sattarov
Damian Borth
MLAU
37
15
0
23 Sep 2021
Tied & Reduced RNN-T Decoder
Tied & Reduced RNN-T Decoder
Rami Botros
Tara N. Sainath
R. David
Emmanuel Guzman
Wei Li
Yanzhang He
40
55
0
15 Sep 2021
Residual Adapters for Parameter-Efficient ASR Adaptation to Atypical and
  Accented Speech
Residual Adapters for Parameter-Efficient ASR Adaptation to Atypical and Accented Speech
Katrin Tomanek
Vicky Zayats
Dirk Padfield
K. Vaillancourt
Fadi Biadsy
59
57
0
14 Sep 2021
Unsupervised Domain Adaptation Schemes for Building ASR in Low-resource
  Languages
Unsupervised Domain Adaptation Schemes for Building ASR in Low-resource Languages
A. C. S.
Prathosh A P
A. G. Ramakrishnan
43
12
0
12 Sep 2021
Previous
123...91011...141516
Next