ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXiv (abs)PDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 1,049 papers shown
Title
Training speaker recognition systems with limited data
Training speaker recognition systems with limited data
Nik Vaessen
David A. van Leeuwen
45
6
0
28 Mar 2022
Listen, Adapt, Better WER: Source-free Single-utterance Test-time
  Adaptation for Automatic Speech Recognition
Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition
Guan-Ting Lin
Shang-Wen Li
Hung-yi Lee
TTAVLM
66
14
0
27 Mar 2022
Data Augmentation Strategies for Improving Sequential Recommender
  Systems
Data Augmentation Strategies for Improving Sequential Recommender Systems
Jooeun Song
B. Suh
28
9
0
26 Mar 2022
Speech-enhanced and Noise-aware Networks for Robust Speech Recognition
Speech-enhanced and Noise-aware Networks for Robust Speech Recognition
Hung-Shin Lee
Pin-Yuan Chen
Yao-Fei Cheng
Yu Tsao
Hsin-Min Wang
46
1
0
25 Mar 2022
AudioTagging Done Right: 2nd comparison of deep learning methods for
  environmental sound classification
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification
Juncheng Billy Li
Shuhui Qu
Po-Yao (Bernie) Huang
Florian Metze
VLM
102
9
0
25 Mar 2022
Automatic Speech Recognition for Speech Assessment of Persian Preschool
  Children
Automatic Speech Recognition for Speech Assessment of Persian Preschool Children
Amirhossein Abaskohi
Fatemeh Mortazavi
Hadi Moradi
65
7
0
24 Mar 2022
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech
  Translation
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation
Chih-Chiang Chang
Hung-yi Lee
90
13
0
22 Mar 2022
Conditional Generative Data Augmentation for Clinical Audio Datasets
Conditional Generative Data Augmentation for Clinical Audio Datasets
Matthias Seibold
A. Hoch
Mazda Farshad
Nassir Navab
Philipp Fürnstahl
MedIm
67
13
0
22 Mar 2022
A Track-Wise Ensemble Event Independent Network for Polyphonic Sound
  Event Localization and Detection
A Track-Wise Ensemble Event Independent Network for Polyphonic Sound Event Localization and Detection
Jinbo Hu
Yin Cao
Ming Wu
Qiuqiang Kong
Feiran Yang
Mark D. Plumbley
J. Yang
64
23
0
19 Mar 2022
Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias
  in Speech Translation
Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation
Beatrice Savoldi
Marco Gaido
L. Bentivogli
Matteo Negri
Marco Turchi
75
27
0
18 Mar 2022
SepTr: Separable Transformer for Audio Spectrogram Processing
SepTr: Separable Transformer for Audio Spectrogram Processing
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
Fahad Shahbaz Khan
ViT
96
32
0
17 Mar 2022
Sample, Translate, Recombine: Leveraging Audio Alignments for Data
  Augmentation in End-to-end Speech Translation
Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation
Tsz Kin Lam
Shigehiko Schamoni
Stefan Riezler
74
34
0
16 Mar 2022
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio
  Classification
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
Yuan Gong
Sameer Khurana
Andrew Rouditchenko
James R. Glass
VLM
73
29
0
13 Mar 2022
Spatial Consistency Loss for Training Multi-Label Classifiers from
  Single-Label Annotations
Spatial Consistency Loss for Training Multi-Label Classifiers from Single-Label Annotations
Thomas Verelst
Paul Kishan Rubenstein
M. Eichner
Tinne Tuytelaars
Maxim Berman
85
20
0
11 Mar 2022
A study on joint modeling and data augmentation of multi-modalities for
  audio-visual scene classification
A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification
Qing Wang
Jun Du
Siyuan Zheng
Yunqing Li
Yajian Wang
...
Hu Hu
Chao-Han Huck Yang
Sabato Marco Siniscalchi
Yannan Wang
Chin-Hui Lee
48
2
0
07 Mar 2022
Leveraging Pre-trained BERT for Audio Captioning
Leveraging Pre-trained BERT for Audio Captioning
Xubo Liu
Xinhao Mei
Qiushi Huang
Jianyuan Sun
Jinzheng Zhao
Haohe Liu
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
115
30
0
06 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDLAI4TSSSL
96
11
0
01 Mar 2022
Explainable deepfake and spoofing detection: an attack analysis using
  SHapley Additive exPlanations
Explainable deepfake and spoofing detection: an attack analysis using SHapley Additive exPlanations
W. Ge
Massimiliano Todisco
Nicholas W. D. Evans
AAML
52
9
0
28 Feb 2022
Visual Speech Recognition for Multiple Languages in the Wild
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
Maja Pantic
VLM
230
152
0
26 Feb 2022
GenéLive! Generating Rhythm Actions in Love Live!
GenéLive! Generating Rhythm Actions in Love Live!
Atsushi Takada
Daichi Yamazaki
Likun Liu
Yudai Yoshida
Nyamkhuu Ganbat
T. Shimotomai
Taiga Yamamoto
Daisuke Sakurai
Naoki Hamada
VLM
69
4
0
25 Feb 2022
Towards Better Meta-Initialization with Task Augmentation for
  Kindergarten-aged Speech Recognition
Towards Better Meta-Initialization with Task Augmentation for Kindergarten-aged Speech Recognition
Yunzheng Zhu
Ruchao Fan
Abeer Alwan
CLL
80
4
0
24 Feb 2022
Attentive Temporal Pooling for Conformer-based Streaming Language
  Identification in Long-form Speech
Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech
Quan Wang
Yang Yu
Jason W. Pelecanos
Yiling Huang
Ignacio López Moreno
86
15
0
24 Feb 2022
Contrastive-mixup learning for improved speaker verification
Contrastive-mixup learning for improved speaker verification
Xin Zhang
Minho Jin
R. Cheng
Ruirui Li
Eunjung Han
A. Stolcke
AAMLSSL
60
11
0
22 Feb 2022
VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end
  Long-form Speech Recognition
VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
Jinhan Wang
Xiaosu Tong
Jinxi Guo
Di He
Roland Maas
71
5
0
22 Feb 2022
S3T: Self-Supervised Pre-training with Swin Transformer for Music
  Classification
S3T: Self-Supervised Pre-training with Swin Transformer for Music Classification
Han Zhao
Chen Zhang
Belei Zhu
Zejun Ma
Ke-jun Zhang
ViT
73
29
0
21 Feb 2022
LPC Augment: An LPC-Based ASR Data Augmentation Algorithm for Low and
  Zero-Resource Children's Dialects
LPC Augment: An LPC-Based ASR Data Augmentation Algorithm for Low and Zero-Resource Children's Dialects
Alexander Johnson
Ruchao Fan
Robin Morris
Abeer Alwan
45
12
0
19 Feb 2022
Domain Adaptation of low-resource Target-Domain models using
  well-trained ASR Conformer Models
Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models
Vrunda N. Sukhadia
S. Umesh
95
8
0
18 Feb 2022
AISHELL-NER: Named Entity Recognition from Chinese Speech
AISHELL-NER: Named Entity Recognition from Chinese Speech
Boli Chen
Guangwei Xu
Xiaobin Wang
Pengjun Xie
Meishan Zhang
Fei Huang
50
31
0
17 Feb 2022
Non-Autoregressive ASR with Self-Conditioned Folded Encoders
Non-Autoregressive ASR with Self-Conditioned Folded Encoders
Tatsuya Komatsu
122
8
0
17 Feb 2022
Knowledge Transfer from Large-scale Pretrained Language Models to
  End-to-end Speech Recognizers
Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers
Yotaro Kubo
Shigeki Karita
M. Bacchiani
53
27
0
16 Feb 2022
Conversational Speech Recognition By Learning Conversation-level
  Characteristics
Conversational Speech Recognition By Learning Conversation-level Characteristics
Kun Wei
Yike Zhang
Sining Sun
Lei Xie
Long Ma
82
9
0
16 Feb 2022
Multimodal Emotion Recognition using Transfer Learning from Speaker
  Recognition and BERT-based models
Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models
Sarala Padi
S. O. Sadjadi
Tianyi Zhou
Ram D. Sriram
74
39
0
16 Feb 2022
What Does it Mean for a Language Model to Preserve Privacy?
What Does it Mean for a Language Model to Preserve Privacy?
Hannah Brown
Katherine Lee
Fatemehsadat Mireshghallah
Reza Shokri
Florian Tramèr
PILM
106
243
0
11 Feb 2022
Improving Automatic Speech Recognition for Non-Native English with
  Transfer Learning and Language Model Decoding
Improving Automatic Speech Recognition for Non-Native English with Transfer Learning and Language Model Decoding
Peter Sullivan
Toshiko Shibano
Muhammad Abdul-Mageed
78
11
0
10 Feb 2022
Neural Architecture Search for Energy Efficient Always-on Audio Models
Neural Architecture Search for Energy Efficient Always-on Audio Models
Daniel T. Speckhard
Karolis Misiunas
Sagi Perel
Tenghui Zhu
S. Carlile
M. Slaney
65
13
0
09 Feb 2022
The Volcspeech system for the ICASSP 2022 multi-channel multi-party
  meeting transcription challenge
The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge
Chen Shen
Yi Y. Liu
Wenzhi Fan
Bin Wang
Shi-Xue Wen
Yao Tian
Jun Zhang
Jingsheng Yang
Zejun Ma
51
4
0
09 Feb 2022
Conversational Agents: Theory and Applications
Conversational Agents: Theory and Applications
M. Wahde
M. Virgolin
LLMAG
68
26
0
07 Feb 2022
MFA: TDNN with Multi-scale Frequency-channel Attention for
  Text-independent Speaker Verification with Short Utterances
MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances
Tianchi Liu
Rohan Kumar Das
Kong Aik Lee
Haizhou Li
134
72
0
03 Feb 2022
The RoyalFlush System of Speech Recognition for M2MeT Challenge
The RoyalFlush System of Speech Recognition for M2MeT Challenge
Shuaishuai Ye
Peiyao Wang
Shunfei Chen
Xinhui Hu
Xinkang Xu
59
5
0
03 Feb 2022
Keyword localisation in untranscribed speech using visually grounded
  speech models
Keyword localisation in untranscribed speech using visually grounded speech models
Kayode Olaleye
Dan Oneaţă
Herman Kamper
63
7
0
02 Feb 2022
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound
  Classification and Detection
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Ke Chen
Xingjian Du
Bilei Zhu
Zejun Ma
Taylor Berg-Kirkpatrick
Shlomo Dubnov
ViT
171
278
0
02 Feb 2022
ColloSSL: Collaborative Self-Supervised Learning for Human Activity
  Recognition
ColloSSL: Collaborative Self-Supervised Learning for Human Activity Recognition
Yash Jain
Chi Ian Tang
Chulhong Min
F. Kawsar
Akhil Mathur
SSL
100
54
0
01 Feb 2022
BEA-Base: A Benchmark for ASR of Spontaneous Hungarian
BEA-Base: A Benchmark for ASR of Spontaneous Hungarian
P. Mihajlik
A. Balog
T. E. Gráczi
A. Kohári
Balázs Tarján
K. Mády
51
8
0
01 Feb 2022
Improving End-to-End Contextual Speech Recognition with Fine-Grained
  Contextual Knowledge Selection
Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection
Minglun Han
Linhao Dong
Zhenlin Liang
Meng Cai
Shiyu Zhou
Zejun Ma
Bo Xu
80
46
0
30 Jan 2022
Sentiment-Aware Automatic Speech Recognition pre-training for enhanced
  Speech Emotion Recognition
Sentiment-Aware Automatic Speech Recognition pre-training for enhanced Speech Emotion Recognition
Ayoub Ghriss
Bo Yang
Viktor Rozgic
Elizabeth Shriberg
Chao Wang
87
21
0
27 Jan 2022
Recency Dropout for Recurrent Recommender Systems
Recency Dropout for Recurrent Recommender Systems
Bo-Yu Chang
Can Xu
Matt Le
Jingchen Feng
Ya Le
Sriraj Badam
Ed H. Chi
Minmin Chen
57
3
0
26 Jan 2022
Run-and-back stitch search: novel block synchronous decoding for
  streaming encoder-decoder ASR
Run-and-back stitch search: novel block synchronous decoding for streaming encoder-decoder ASR
E. Tsunoo
Chaitanya Narisetty
Michael Hentschel
Yosuke Kashiwagi
Shinji Watanabe
48
2
0
25 Jan 2022
NAS-VAD: Neural Architecture Search for Voice Activity Detection
NAS-VAD: Neural Architecture Search for Voice Activity Detection
Daniel Rho
Jinhyeok Park
J. Ko
80
6
0
22 Jan 2022
Supervised and Self-supervised Pretraining Based COVID-19 Detection
  Using Acoustic Breathing/Cough/Speech Signals
Supervised and Self-supervised Pretraining Based COVID-19 Detection Using Acoustic Breathing/Cough/Speech Signals
Xing-Yu Chen
Qiu-shi Zhu
Jie Zhang
Lirong Dai
65
15
0
22 Jan 2022
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery
  Detection
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection
A. Haliassos
Rodrigo Mira
Stavros Petridis
Maja Pantic
CVBM
121
133
0
18 Jan 2022
Previous
123...91011...192021
Next