ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXivPDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 754 papers shown
Title
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker
  Recognition?
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Sanyuan Chen
Yu Wu
Chengyi Wang
Shujie Liu
Zhuo Chen
...
Gang Liu
Jinyu Li
Jian Wu
Xiangzhan Yu
Furu Wei
SSL
20
40
0
27 Apr 2022
Mask scalar prediction for improving robust automatic speech recognition
Mask scalar prediction for improving robust automatic speech recognition
A. Narayanan
James Walker
S. Panchapagesan
N. Howard
Yuma Koizumi
24
4
0
26 Apr 2022
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR
Wenjie Huang
Shuo-yiin Chang
David Rybach
Rohit Prabhavalkar
Tara N. Sainath
Cyril Allauzen
Cal Peyser
Zhiyun Lu
VLM
44
24
0
22 Apr 2022
The 2021 NIST Speaker Recognition Evaluation
The 2021 NIST Speaker Recognition Evaluation
S. O. Sadjadi
Craig S. Greenberg
E. Singer
Lisa P. Mason
D. A. Reynolds
20
74
0
21 Apr 2022
Blockwise Streaming Transformer for Spoken Language Understanding and
  Simultaneous Speech Translation
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation
Keqi Deng
Shinji Watanabe
Jiatong Shi
Siddhant Arora
33
15
0
19 Apr 2022
Small Footprint Multi-channel ConvMixer for Keyword Spotting with
  Centroid Based Awareness
Small Footprint Multi-channel ConvMixer for Keyword Spotting with Centroid Based Awareness
Dianwen Ng
Jing Pang
Yanghua Xiao
Biao Tian
Qiang Fu
Eng Siong Chng
27
2
0
11 Apr 2022
Production federated keyword spotting via distillation, filtering, and
  joint federated-centralized training
Production federated keyword spotting via distillation, filtering, and joint federated-centralized training
Andrew Straiton Hard
Kurt Partridge
Neng Chen
S. Augenstein
Aishanee Shah
...
Sara Ng
Jessica Nguyen
Ignacio López Moreno
Rajiv Mathews
F. Beaufays
FedML
29
14
0
11 Apr 2022
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in
  End-to-End Speech-to-Intent Systems
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
Vishal Sunder
Eric Fosler-Lussier
Samuel Thomas
H. Kuo
Brian Kingsbury
23
7
0
11 Apr 2022
Auditory-Based Data Augmentation for End-to-End Automatic Speech
  Recognition
Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition
Zehai Tu
Jack Deadman
Ning Ma
Jon Barker
35
4
0
08 Apr 2022
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Rong Ye
Chengqi Zhao
Tom Ko
Chutong Meng
Tao Wang
Mingxuan Wang
Jun Cao
11
23
0
08 Apr 2022
Transducer-based language embedding for spoken language identification
Transducer-based language embedding for spoken language identification
Peng Shen
Xugang Lu
Hisashi Kawai
56
6
0
08 Apr 2022
Frequency Selective Augmentation for Video Representation Learning
Frequency Selective Augmentation for Video Representation Learning
Jinhyung Kim
Taeoh Kim
Minho Shim
Dongyoon Han
Dongyoon Wee
Junmo Kim
AI4TS
56
3
0
08 Apr 2022
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for
  On-Device Speech Recognition
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition
Shaojin Ding
R. Rikhye
Qiao Liang
Yanzhang He
Quan Wang
A. Narayanan
Tom O'Malley
Ian McGraw
29
27
0
08 Apr 2022
Detecting Vocal Fatigue with Neural Embeddings
Detecting Vocal Fatigue with Neural Embeddings
Sebastian P. Bayerl
Dominik Wagner
Ilja Baumann
Korbinian Riedhammer
Tobias Bocklet
32
11
0
07 Apr 2022
DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training
  and Distribution of Opinion Scores
DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores
Wei-Cheng Tseng
Wei-Tsung Kao
Hung-yi Lee
24
21
0
07 Apr 2022
A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods
  to Improve Child Speech Recognition
A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition
Rishabh Jain
Andrei Barcovschi
Mariam Yiwere
Dan Bigioi
Peter Corcoran
H. Cucu
30
31
0
06 Apr 2022
Combining Spectral and Self-Supervised Features for Low Resource Speech
  Recognition and Translation
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation
Dan Berrebbi
Jiatong Shi
Brian Yan
Osbel López-Francisco
Jonathan D. Amith
Shinji Watanabe
21
26
0
05 Apr 2022
A Complementary Joint Training Approach Using Unpaired Speech and Text
  for Low-Resource Automatic Speech Recognition
A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition
Ye Du
Jie Zhang
Qiu-shi Zhu
Lirong Dai
Ming Wu
Xin Fang
Zhouwang Yang
34
2
0
05 Apr 2022
A Novel Capsule Neural Network Based Model for Drowsiness Detection
  Using Electroencephalography Signals
A Novel Capsule Neural Network Based Model for Drowsiness Detection Using Electroencephalography Signals
Luis Guarda
Juan Tapia
E. Droguett
M. Ramos
24
27
0
04 Apr 2022
An Analysis of Semantically-Aligned Speech-Text Embeddings
An Analysis of Semantically-Aligned Speech-Text Embeddings
M. Huzaifah
Ivan Kukanov
35
7
0
04 Apr 2022
Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur
  Speech Recognition
Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition
Guodong Ma
Pengfei Hu
Jian Kang
Shen Huang
Hao-Ming Huang
35
9
0
02 Apr 2022
End-to-End Integration of Speech Recognition, Speech Enhancement, and
  Self-Supervised Learning Representation
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
Xuankai Chang
Takashi Maekaku
Yuya Fujita
Shinji Watanabe
VLM
59
45
0
01 Apr 2022
Effect and Analysis of Large-scale Language Model Rescoring on
  Competitive ASR Systems
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems
Takuma Udagawa
Masayuki Suzuki
Gakuto Kurata
N. Itoh
G. Saon
49
23
0
01 Apr 2022
Improved Relation Networks for End-to-End Speaker Verification and
  Identification
Improved Relation Networks for End-to-End Speaker Verification and Identification
Ashutosh Chaubey
Sparsh Sinha
Susmita Ghose
27
3
0
31 Mar 2022
How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An
  Extensive Benchmark on Air Traffic Control Communications
How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications
Juan Pablo Zuluaga
Amrutha Prasad
Iuliia Nigmatulina
Seyyed Saeed Sarfjoo
P. Motlícek
Matthias Kleinert
H. Helmke
Oliver Ohneiser
Qingran Zhan
34
44
0
31 Mar 2022
Streaming parallel transducer beam search with fast-slow cascaded
  encoders
Streaming parallel transducer beam search with fast-slow cascaded encoders
Jay Mahadeokar
Yangyang Shi
Ke Li
Duc Le
Jiedan Zhu
Vikas Chandra
Ozlem Kalinli
M. Seltzer
42
15
0
29 Mar 2022
Integrating Lattice-Free MMI into End-to-End Speech Recognition
Integrating Lattice-Free MMI into End-to-End Speech Recognition
Jinchuan Tian
Jianwei Yu
Chao Weng
Yuexian Zou
Dong Yu
37
8
0
29 Mar 2022
Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition
  With Emformer
Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition With Emformer
J. Sun
Guiping Zhong
Dinghao Zhou
Baoxiang Li
21
0
0
29 Mar 2022
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
Binbin Zhang
Di Wu
Zhendong Peng
Xingcheng Song
Zhuoyuan Yao
Hang Lv
Linfu Xie
Chao Yang
Fuping Pan
Jianwei Niu
VLM
34
94
0
29 Mar 2022
Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain
  Data
Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain Data
Chen Chen
Nana Hou
Yuchen Hu
Shashank Shirol
Chng Eng Siong
NoLa
30
43
0
29 Mar 2022
Listen, Adapt, Better WER: Source-free Single-utterance Test-time
  Adaptation for Automatic Speech Recognition
Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition
Guan-Ting Lin
Shang-Wen Li
Hung-yi Lee
TTA
VLM
21
10
0
27 Mar 2022
Speech-enhanced and Noise-aware Networks for Robust Speech Recognition
Speech-enhanced and Noise-aware Networks for Robust Speech Recognition
Hung-Shin Lee
Pin-Yuan Chen
Yao-Fei Cheng
Yu Tsao
Hsin-Min Wang
27
1
0
25 Mar 2022
AudioTagging Done Right: 2nd comparison of deep learning methods for
  environmental sound classification
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification
Juncheng Billy Li
Shuhui Qu
Po-Yao (Bernie) Huang
Florian Metze
VLM
41
9
0
25 Mar 2022
Automatic Speech Recognition for Speech Assessment of Persian Preschool
  Children
Automatic Speech Recognition for Speech Assessment of Persian Preschool Children
Amirhossein Abaskohi
Fatemeh Mortazavi
Hadi Moradi
39
6
0
24 Mar 2022
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech
  Translation
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation
Chih-Chiang Chang
Hung-yi Lee
32
13
0
22 Mar 2022
A Track-Wise Ensemble Event Independent Network for Polyphonic Sound
  Event Localization and Detection
A Track-Wise Ensemble Event Independent Network for Polyphonic Sound Event Localization and Detection
Jinbo Hu
Yin Cao
Ming Wu
Qiuqiang Kong
Feiran Yang
Mark D. Plumbley
J. Yang
24
23
0
19 Mar 2022
Transformer-based Streaming ASR with Cumulative Attention
Transformer-based Streaming ASR with Cumulative Attention
Mohan Li
Shucong Zhang
Catalin Zorila
R. Doddipatla
27
9
0
11 Mar 2022
Leveraging Pre-trained BERT for Audio Captioning
Leveraging Pre-trained BERT for Audio Captioning
Xubo Liu
Xinhao Mei
Qiushi Huang
Jianyuan Sun
Jinzheng Zhao
Haohe Liu
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
38
30
0
06 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDL
AI4TS
SSL
24
11
0
01 Mar 2022
Explainable deepfake and spoofing detection: an attack analysis using
  SHapley Additive exPlanations
Explainable deepfake and spoofing detection: an attack analysis using SHapley Additive exPlanations
W. Ge
Massimiliano Todisco
Nicholas W. D. Evans
AAML
34
8
0
28 Feb 2022
Integrating Text Inputs For Training and Adapting RNN Transducer ASR
  Models
Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Samuel Thomas
Brian Kingsbury
G. Saon
H. Kuo
36
25
0
26 Feb 2022
Visual Speech Recognition for Multiple Languages in the Wild
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
Maja Pantic
VLM
132
147
0
26 Feb 2022
GenéLive! Generating Rhythm Actions in Love Live!
GenéLive! Generating Rhythm Actions in Love Live!
Atsushi Takada
Daichi Yamazaki
Likun Liu
Yudai Yoshida
Nyamkhuu Ganbat
T. Shimotomai
Taiga Yamamoto
Daisuke Sakurai
Naoki Hamada
VLM
33
4
0
25 Feb 2022
Towards Better Meta-Initialization with Task Augmentation for
  Kindergarten-aged Speech Recognition
Towards Better Meta-Initialization with Task Augmentation for Kindergarten-aged Speech Recognition
Yunzheng Zhu
Ruchao Fan
Abeer Alwan
CLL
38
4
0
24 Feb 2022
Contrastive-mixup learning for improved speaker verification
Contrastive-mixup learning for improved speaker verification
Xin Zhang
Minho Jin
R. Cheng
Ruirui Li
Eunjung Han
A. Stolcke
AAML
SSL
25
10
0
22 Feb 2022
VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end
  Long-form Speech Recognition
VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
Jinhan Wang
Xiaosu Tong
Jinxi Guo
Di He
Roland Maas
31
5
0
22 Feb 2022
Spanish and English Phoneme Recognition by Training on Simulated
  Classroom Audio Recordings of Collaborative Learning Environments
Spanish and English Phoneme Recognition by Training on Simulated Classroom Audio Recordings of Collaborative Learning Environments
Mario Esparza
35
0
0
21 Feb 2022
S3T: Self-Supervised Pre-training with Swin Transformer for Music
  Classification
S3T: Self-Supervised Pre-training with Swin Transformer for Music Classification
Han Zhao
Chen Zhang
Belei Zhu
Zejun Ma
Ke-jun Zhang
ViT
21
28
0
21 Feb 2022
LPC Augment: An LPC-Based ASR Data Augmentation Algorithm for Low and
  Zero-Resource Children's Dialects
LPC Augment: An LPC-Based ASR Data Augmentation Algorithm for Low and Zero-Resource Children's Dialects
Alexander Johnson
Ruchao Fan
Robin Morris
Abeer Alwan
16
12
0
19 Feb 2022
Domain Adaptation of low-resource Target-Domain models using
  well-trained ASR Conformer Models
Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models
Vrunda N. Sukhadia
S. Umesh
43
8
0
18 Feb 2022
Previous
123...789...141516
Next