ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXiv (abs)PDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 1,048 papers shown
Title
A Study On Data Augmentation In Voice Anti-Spoofing
A Study On Data Augmentation In Voice Anti-Spoofing
Ariel Cohen
Inbal Rimon
Eran Aflalo
Haim Permuter
78
46
0
20 Oct 2021
SSAST: Self-Supervised Audio Spectrogram Transformer
SSAST: Self-Supervised Audio Spectrogram Transformer
Yuan Gong
Cheng-I Jeff Lai
Yu-An Chung
James R. Glass
ViT
100
277
0
19 Oct 2021
Efficient Sequence Training of Attention Models using Approximative
  Recombination
Efficient Sequence Training of Attention Models using Approximative Recombination
Nils-Philipp Wynands
Wilfried Michel
Jan Rosendahl
Ralf Schluter
Hermann Ney
45
3
0
18 Oct 2021
Improving End-To-End Modeling for Mispronunciation Detection with
  Effective Augmentation Mechanisms
Improving End-To-End Modeling for Mispronunciation Detection with Effective Augmentation Mechanisms
Tien-Hong Lo
Y. Sung
Berlin Chen
39
7
0
17 Oct 2021
Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming
  E2E ASR via Supernet
Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet
Haichuan Yang
Yuan Shangguan
Dilin Wang
Meng Li
P. Chuang
Xiaohui Zhang
Ganesh Venkatesh
Ozlem Kalinli
Vikas Chandra
84
14
0
15 Oct 2021
Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Sangeeta Srivastava
Yun Wang
Andros Tjandra
Anurag Kumar
Chunxi Liu
Kritika Singh
Yatharth Saraf
SSL
99
25
0
14 Oct 2021
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same
  Class with Auxiliary Duplicating Permutation Invariant Training
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training
Kazuki Shimada
Yuichiro Koyama
Shusuke Takahashi
Naoya Takahashi
E. Tsunoo
Yuki Mitsufuji
72
66
0
14 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
431
1,115
0
13 Oct 2021
Study of positional encoding approaches for Audio Spectrogram
  Transformers
Study of positional encoding approaches for Audio Spectrogram Transformers
L. Pepino
Pablo Riera
Luciana Ferrer
ViT
53
7
0
13 Oct 2021
Decision Attentive Regularization to Improve Simultaneous Speech
  Translation Systems
Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems
Mohd Abbas Zaidi
Beomseok Lee
Sangha Kim
Chanwoo Kim
66
5
0
13 Oct 2021
Duality Temporal-channel-frequency Attention Enhanced Speaker
  Representation Learning
Duality Temporal-channel-frequency Attention Enhanced Speaker Representation Learning
Li Zhang
Qing Wang
Lei Xie
114
17
0
13 Oct 2021
Exploring Wav2vec 2.0 fine-tuning for improved speech emotion
  recognition
Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition
Li-Wei Chen
Alexander I. Rudnicky
VLM
105
130
0
12 Oct 2021
Multi-Modal Pre-Training for Automated Speech Recognition
Multi-Modal Pre-Training for Automated Speech Recognition
David M. Chan
Shalini Ghosh
D. Chakrabarty
Björn Hoffmeister
SSL
92
16
0
12 Oct 2021
Spatial mixup: Directional loudness modification as data augmentation
  for sound event localization and detection
Spatial mixup: Directional loudness modification as data augmentation for sound event localization and detection
Ricardo Falcón Pérez
Kazuki Shimada
Yuichiro Koyama
Shusuke Takahashi
Yuki Mitsufuji
130
5
0
12 Oct 2021
Improving the Performance of Automated Audio Captioning via Integrating
  the Acoustic and Semantic Information
Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information
Zhongjie Ye
Helin Wang
Dongchao Yang
Yuexian Zou
101
28
0
12 Oct 2021
Word Order Does Not Matter For Speech Recognition
Word Order Does Not Matter For Speech Recognition
Vineel Pratap
Qiantong Xu
Tatiana Likhomanenko
Gabriel Synnaeve
R. Collobert
79
4
0
12 Oct 2021
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition
Jing Pan
Tao Lei
Kwangyoun Kim
Kyu Jeong Han
Shinji Watanabe
VLM
57
10
0
11 Oct 2021
A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text
  Generation
A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation
Yosuke Higuchi
Nanxin Chen
Yuya Fujita
Hirofumi Inaguma
Tatsuya Komatsu
Jaesong Lee
Jumon Nozaki
Tianzi Wang
Shinji Watanabe
49
43
0
11 Oct 2021
K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of
  Graphemes and Syllables
K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables
Jounghee Kim
Pilsung Kang
VLM
44
6
0
11 Oct 2021
Efficient Training of Audio Transformers with Patchout
Efficient Training of Audio Transformers with Patchout
Khaled Koutini
Jan Schluter
Hamid Eghbalzadeh
Gerhard Widmer
ViT
176
263
0
11 Oct 2021
Data Augmentation with Locally-time Reversed Speech for Automatic Speech
  Recognition
Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition
Si-Ioi Ng
Tan Lee
34
2
0
09 Oct 2021
TitaNet: Neural Model for speaker representation with 1D Depth-wise
  separable convolutions and global context
TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context
Nithin Rao Koluguri
Taejin Park
Boris Ginsburg
ViT
121
104
0
08 Oct 2021
Improving Pseudo-label Training For End-to-end Speech Recognition Using
  Gradient Mask
Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask
Shaoshi Ling
Chen Shen
Meng Cai
Zejun Ma
VLMSSL
76
10
0
08 Oct 2021
Neural Model Reprogramming with Similarity Based Mapping for
  Low-Resource Spoken Command Recognition
Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition
Hao Yen
Pin-Jui Ku
Chao-Han Huck Yang
Hu Hu
Sabato Marco Siniscalchi
Pin-Yu Chen
Yu Tsao
114
5
0
08 Oct 2021
Phone-to-audio alignment without text: A Semi-supervised Approach
Phone-to-audio alignment without text: A Semi-supervised Approach
Jian Zhu
Cong Zhang
David Jurgens
63
38
0
08 Oct 2021
PEAF: Learnable Power Efficient Analog Acoustic Features for Audio
  Recognition
PEAF: Learnable Power Efficient Analog Acoustic Features for Audio Recognition
Boris Bergsma
Minhao Yang
Milos Cernak
60
4
0
07 Oct 2021
Enabling On-Device Training of Speech Recognition Models with Federated
  Dropout
Enabling On-Device Training of Speech Recognition Models with Federated Dropout
Dhruv Guliani
Lillian Zhou
Changwan Ryu
Tien-Ju Yang
Harry Zhang
Yong Xiao
F. Beaufays
Giovanni Motta
FedML
58
16
0
07 Oct 2021
Peer Collaborative Learning for Polyphonic Sound Event Detection
Peer Collaborative Learning for Polyphonic Sound Event Detection
Hayato Endo
Hiromitsu Nishizaki
39
4
0
07 Oct 2021
Mandarin-English Code-switching Speech Recognition with Self-supervised
  Speech Representation Models
Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models
Liang-Hsuan Tseng
Yu-Kuan Fu
Heng-Jui Chang
Hung-yi Lee
SSL
47
14
0
07 Oct 2021
WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech
  Recognition
WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition
Binbin Zhang
Hang Lv
Pengcheng Guo
Qijie Shao
Chao Yang
...
Hui Bu
Xiaoyu Chen
Chenchen Zeng
Di Wu
Zhendong Peng
138
231
0
07 Oct 2021
Transferring Voice Knowledge for Acoustic Event Detection: An Empirical
  Study
Transferring Voice Knowledge for Acoustic Event Detection: An Empirical Study
Dawei Liang
Yangyang Shi
Yun Wang
Nayan Singhal
Alex Xiao
Jonathan Shaw
Edison Thomaz
Ozlem Kalinli
M. Seltzer
50
4
0
07 Oct 2021
An Investigation of the Effectiveness of Phase for Audio Classification
An Investigation of the Effectiveness of Phase for Audio Classification
Shunsuke Hidaka
Kohei Wakamiya
T. Kaburagi
28
4
0
06 Oct 2021
ASR Rescoring and Confidence Estimation with ELECTRA
ASR Rescoring and Confidence Estimation with ELECTRA
Hayato Futami
Hirofumi Inaguma
Masato Mimura
S. Sakai
Tatsuya Kawahara
KELM
104
21
0
05 Oct 2021
SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic
  Sound Event Localization and Detection
SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection
Thi Ngoc Tho Nguyen
Karn N. Watcharasupat
Ngoc Khanh Nguyen
Douglas L. Jones
W. Gan
80
49
0
01 Oct 2021
SpliceOut: A Simple and Efficient Audio Augmentation Method
SpliceOut: A Simple and Efficient Audio Augmentation Method
Arjit Jain
Pranay Reddy Samala
Deepak Mittal
Preethi Jyothi
M. Singh
132
11
0
30 Sep 2021
Fine-tuning wav2vec2 for speaker recognition
Fine-tuning wav2vec2 for speaker recognition
Nik Vaessen
David A. van Leeuwen
116
109
0
30 Sep 2021
FastCorrect 2: Fast Error Correction on Multiple Candidates for
  Automatic Speech Recognition
FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition
Yichong Leng
Xu Tan
Rui Wang
Linchen Zhu
Jin Xu
...
Linquan Liu
Tao Qin
Xiang-Yang Li
Ed Lin
Tie-Yan Liu
129
42
0
29 Sep 2021
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning
  for Automatic Speech Recognition
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Yu Zhang
Daniel S. Park
Wei Han
James Qin
Anmol Gulati
...
Zhifeng Chen
Quoc V. Le
Chung-Cheng Chiu
Ruoming Pang
Yonghui Wu
SSL
86
176
0
27 Sep 2021
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with
  Non-Autoregressive Hidden Intermediates
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates
Hirofumi Inaguma
Siddharth Dalmia
Brian Yan
Shinji Watanabe
99
11
0
27 Sep 2021
ChannelAugment: Improving generalization of multi-channel ASR by
  training with input channel randomization
ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization
M. Gaudesi
F. Weninger
D. Sharma
P. Zhan
AAML
73
1
0
23 Sep 2021
Multi-view Contrastive Self-Supervised Learning of Accounting Data
  Representations for Downstream Audit Tasks
Multi-view Contrastive Self-Supervised Learning of Accounting Data Representations for Downstream Audit Tasks
Marco Schreyer
Timur Sattarov
Damian Borth
MLAU
76
15
0
23 Sep 2021
Hybrid Data Augmentation and Deep Attention-based Dilated
  Convolutional-Recurrent Neural Networks for Speech Emotion Recognition
Hybrid Data Augmentation and Deep Attention-based Dilated Convolutional-Recurrent Neural Networks for Speech Emotion Recognition
Nhat Truong Pham
Duc Ngoc Minh Dang
Sy Dzung Nguyen
27
38
0
18 Sep 2021
Dual-Encoder Architecture with Encoder Selection for Joint Close-Talk
  and Far-Talk Speech Recognition
Dual-Encoder Architecture with Encoder Selection for Joint Close-Talk and Far-Talk Speech Recognition
F. Weninger
M. Gaudesi
Ralf Leibold
R. Gemello
P. Zhan
46
4
0
17 Sep 2021
Tied & Reduced RNN-T Decoder
Tied & Reduced RNN-T Decoder
Rami Botros
Tara N. Sainath
R. David
Emmanuel Guzman
Wei Li
Yanzhang He
86
55
0
15 Sep 2021
Dialog speech sentiment classification for imbalanced datasets
Dialog speech sentiment classification for imbalanced datasets
Sergis Nicolaou
Lambros Mavrides
G. Tryfou
Kyriakos Tolias
Konstantinos P. Panousis
S. Chatzis
Sergios Theodoridis
41
0
0
15 Sep 2021
Residual Adapters for Parameter-Efficient ASR Adaptation to Atypical and
  Accented Speech
Residual Adapters for Parameter-Efficient ASR Adaptation to Atypical and Accented Speech
Katrin Tomanek
Vicky Zayats
Dirk Padfield
K. Vaillancourt
Fadi Biadsy
128
58
0
14 Sep 2021
Non-autoregressive Transformer with Unified Bidirectional Decoder for
  Automatic Speech Recognition
Non-autoregressive Transformer with Unified Bidirectional Decoder for Automatic Speech Recognition
Chuan-Fei Zhang
Yang Liu
Tianren Zhang
Songlu Chen
Feng Chen
Xu-Cheng Yin
56
8
0
14 Sep 2021
Unsupervised Domain Adaptation Schemes for Building ASR in Low-resource
  Languages
Unsupervised Domain Adaptation Schemes for Building ASR in Low-resource Languages
A. C. S.
Prathosh A P
A. G. Ramakrishnan
91
13
0
12 Sep 2021
Self-Attention Channel Combinator Frontend for End-to-End Multichannel
  Far-field Speech Recognition
Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition
Rong Gong
Carl Quillen
D. Sharma
Andrew Goderre
José Laínez
Ljubomir Milanović
94
14
0
10 Sep 2021
Speechformer: Reducing Information Loss in Direct Speech Translation
Speechformer: Reducing Information Loss in Direct Speech Translation
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
129
24
0
09 Sep 2021
Previous
123...111213...192021
Next