ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXiv (abs)PDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 1,049 papers shown
Title
Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition
Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition
Yuan Gong
Jingbo Yu
James R. Glass
89
42
0
06 May 2022
ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource
  Speech Translation Tasks
ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks
Marcely Zanon Boito
John E. Ortega
Hugo Riguidel
Antoine Laurent
Loïc Barrault
...
Firas Chaabani
H. Nguyen
Florentin Barbier
Souhir Gahbiche
Yannick Esteve
64
16
0
04 May 2022
On monoaural speech enhancement for automatic recognition of real noisy
  speech using mixture invariant training
On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training
Jisi Zhang
Catalin Zorila
R. Doddipatla
Jon Barker
61
4
0
03 May 2022
Efficient dynamic filter for robust and low computational feature
  extraction
Efficient dynamic filter for robust and low computational feature extraction
Donghyeon Kim
Gwantae Kim
Bokyeung Lee
Jeong-gi Kwak
D. Han
Hanseok Ko
60
3
0
03 May 2022
Pseudo strong labels for large scale weakly supervised audio tagging
Pseudo strong labels for large scale weakly supervised audio tagging
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
63
6
0
28 Apr 2022
Improving Multimodal Speech Recognition by Data Augmentation and Speech
  Representations
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations
Dan Oneaţă
H. Cucu
51
19
0
27 Apr 2022
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker
  Recognition?
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Sanyuan Chen
Yu Wu
Chengyi Wang
Shujie Liu
Zhuo Chen
...
Gang Liu
Jinyu Li
Jian Wu
Xiangzhan Yu
Furu Wei
SSL
102
42
0
27 Apr 2022
Improving the Naturalness of Simulated Conversations for End-to-End
  Neural Diarization
Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization
Natsuo Yamashita
Shota Horiguchi
Takeshi Homma
74
18
0
24 Apr 2022
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR
Wenjie Huang
Shuo-yiin Chang
David Rybach
Rohit Prabhavalkar
Tara N. Sainath
Cyril Allauzen
Cal Peyser
Zhiyun Lu
VLM
93
24
0
22 Apr 2022
The 2021 NIST Speaker Recognition Evaluation
The 2021 NIST Speaker Recognition Evaluation
S. O. Sadjadi
Craig S. Greenberg
E. Singer
Lisa P. Mason
D. A. Reynolds
94
74
0
21 Apr 2022
The NIST CTS Speaker Recognition Challenge
The NIST CTS Speaker Recognition Challenge
S. O. Sadjadi
Craig S. Greenberg
E. Singer
Lisa P. Mason
D. Reynolds
ELM
133
0
0
21 Apr 2022
Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech
  Recognition
Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition
Xun Gong
Y. Qian
Houjun Huang
Yanmin Qian
81
46
0
21 Apr 2022
Detecting Unintended Memorization in Language-Model-Fused ASR
Detecting Unintended Memorization in Language-Model-Fused ASR
Wenjie Huang
Steve Chien
Om Thakkar
Rajiv Mathews
87
11
0
20 Apr 2022
Blockwise Streaming Transformer for Spoken Language Understanding and
  Simultaneous Speech Translation
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation
Keqi Deng
Shinji Watanabe
Jiatong Shi
Siddhant Arora
75
15
0
19 Apr 2022
Audio Deep Fake Detection System with Neural Stitching for ADD 2022
Audio Deep Fake Detection System with Neural Stitching for ADD 2022
Rui Yan
Cheng Wen
Shuran Zhou
Tingwei Guo
Wei Zou
Xiangang Li
49
24
0
19 Apr 2022
Caption Feature Space Regularization for Audio Captioning
Caption Feature Space Regularization for Audio Captioning
Yiming Zhang
Hong Yu
Ruoyi Du
Zhanyu Ma
Yuan Dong
122
1
0
18 Apr 2022
Small Footprint Multi-channel ConvMixer for Keyword Spotting with
  Centroid Based Awareness
Small Footprint Multi-channel ConvMixer for Keyword Spotting with Centroid Based Awareness
Dianwen Ng
Jing Pang
Yanghua Xiao
Biao Tian
Qiang Fu
Eng Siong Chng
64
2
0
11 Apr 2022
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in
  End-to-End Speech-to-Intent Systems
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
Vishal Sunder
Eric Fosler-Lussier
Samuel Thomas
H. Kuo
Brian Kingsbury
78
7
0
11 Apr 2022
Towards End-to-End Integration of Dialog History for Improved Spoken
  Language Understanding
Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding
Vishal Sunder
Samuel Thomas
H. Kuo
Jatin Ganhotra
Brian Kingsbury
Eric Fosler-Lussier
VLM
96
10
0
11 Apr 2022
Auditory-Based Data Augmentation for End-to-End Automatic Speech
  Recognition
Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition
Zehai Tu
Jack Deadman
Ning Ma
Jon Barker
66
4
0
08 Apr 2022
Automatic Data Augmentation Selection and Parametrization in Contrastive
  Self-Supervised Speech Representation Learning
Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning
Salah Zaiem
Titouan Parcollet
S. Essid
SSL
41
6
0
08 Apr 2022
Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or
  PLDA?
Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA?
Qiongqiong Wang
Kong Aik Lee
Tianchi Liu
67
16
0
08 Apr 2022
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Rong Ye
Chengqi Zhao
Tom Ko
Chutong Meng
Tao Wang
Mingxuan Wang
Jun Cao
89
23
0
08 Apr 2022
Transducer-based language embedding for spoken language identification
Transducer-based language embedding for spoken language identification
Peng Shen
Xugang Lu
Hisashi Kawai
84
6
0
08 Apr 2022
Frequency Selective Augmentation for Video Representation Learning
Frequency Selective Augmentation for Video Representation Learning
Jinhyung Kim
Taeoh Kim
Minho Shim
Dongyoon Han
Dongyoon Wee
Junmo Kim
AI4TS
101
4
0
08 Apr 2022
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for
  On-Device Speech Recognition
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition
Shaojin Ding
R. Rikhye
Qiao Liang
Yanzhang He
Quan Wang
A. Narayanan
Tom O'Malley
Ian McGraw
92
28
0
08 Apr 2022
Detecting Vocal Fatigue with Neural Embeddings
Detecting Vocal Fatigue with Neural Embeddings
Sebastian P. Bayerl
Dominik Wagner
Ilja Baumann
Korbinian Riedhammer
Tobias Bocklet
64
11
0
07 Apr 2022
MAESTRO: Matched Speech Text Representations through Modality Matching
MAESTRO: Matched Speech Text Representations through Modality Matching
Zhehuai Chen
Yu Zhang
Andrew Rosenberg
Bhuvana Ramabhadran
Pedro J. Moreno
Ankur Bapna
Heiga Zen
98
108
0
07 Apr 2022
DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training
  and Distribution of Opinion Scores
DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores
Wei-Cheng Tseng
Wei-Tsung Kao
Hung-yi Lee
80
21
0
07 Apr 2022
A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods
  to Improve Child Speech Recognition
A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition
Rishabh Jain
Andrei Barcovschi
Mariam Yiwere
Dan Bigioi
Peter Corcoran
H. Cucu
54
35
0
06 Apr 2022
Successes and critical failures of neural networks in capturing
  human-like speech recognition
Successes and critical failures of neural networks in capturing human-like speech recognition
Federico Adolfi
J. Bowers
David Poeppel
UQCV
86
22
0
06 Apr 2022
Towards End-to-end Unsupervised Speech Recognition
Towards End-to-end Unsupervised Speech Recognition
Alexander H. Liu
Wei-Ning Hsu
Michael Auli
Alexei Baevski
SSL
83
74
0
05 Apr 2022
Combining Spectral and Self-Supervised Features for Low Resource Speech
  Recognition and Translation
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation
Dan Berrebbi
Jiatong Shi
Brian Yan
Osbel López-Francisco
Jonathan D. Amith
Shinji Watanabe
68
27
0
05 Apr 2022
A Novel Capsule Neural Network Based Model for Drowsiness Detection
  Using Electroencephalography Signals
A Novel Capsule Neural Network Based Model for Drowsiness Detection Using Electroencephalography Signals
Luis Guarda
Juan Tapia
E. Droguett
M. Ramos
33
27
0
04 Apr 2022
An Analysis of Semantically-Aligned Speech-Text Embeddings
An Analysis of Semantically-Aligned Speech-Text Embeddings
M. Huzaifah
Ivan Kukanov
90
8
0
04 Apr 2022
Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur
  Speech Recognition
Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition
Guodong Ma
Pengfei Hu
Jian Kang
Shen Huang
Hao-Ming Huang
78
9
0
02 Apr 2022
End-to-End Integration of Speech Recognition, Speech Enhancement, and
  Self-Supervised Learning Representation
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
Xuankai Chang
Takashi Maekaku
Yuya Fujita
Shinji Watanabe
VLM
111
46
0
01 Apr 2022
Text-To-Speech Data Augmentation for Low Resource Speech Recognition
Text-To-Speech Data Augmentation for Low Resource Speech Recognition
Rodolfo Zevallos
50
4
0
01 Apr 2022
Effect and Analysis of Large-scale Language Model Rescoring on
  Competitive ASR Systems
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems
Takuma Udagawa
Masayuki Suzuki
Gakuto Kurata
N. Itoh
G. Saon
115
24
0
01 Apr 2022
Improved Relation Networks for End-to-End Speaker Verification and
  Identification
Improved Relation Networks for End-to-End Speaker Verification and Identification
Ashutosh Chaubey
Sparsh Sinha
Susmita Ghose
58
3
0
31 Mar 2022
Memory-Efficient Training of RNN-Transducer with Sampled Softmax
Memory-Efficient Training of RNN-Transducer with Sampled Softmax
Jaesong Lee
Lukas Lee
Shinji Watanabe
105
8
0
31 Mar 2022
Open Source MagicData-RAMC: A Rich Annotated Mandarin
  Conversational(RAMC) Speech Dataset
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset
Zehui Yang
Yifan Chen
Lei Luo
Runyan Yang
Lingxuan Ye
...
Yaohui Jin
Qingqing Zhang
Pengyuan Zhang
Lei Xie
Yonghong Yan
69
51
0
31 Mar 2022
How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An
  Extensive Benchmark on Air Traffic Control Communications
How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications
Juan Pablo Zuluaga
Amrutha Prasad
Iuliia Nigmatulina
Seyyed Saeed Sarfjoo
P. Motlícek
Matthias Kleinert
H. Helmke
Oliver Ohneiser
Qingran Zhan
78
44
0
31 Mar 2022
CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming
  ASR
CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR
Keyu An
Huahuan Zheng
Zhijian Ou
Hongyu Xiang
Ke Ding
Guanglu Wan
AI4TS
52
19
0
31 Mar 2022
Streaming parallel transducer beam search with fast-slow cascaded
  encoders
Streaming parallel transducer beam search with fast-slow cascaded encoders
Jay Mahadeokar
Yangyang Shi
Ke Li
Duc Le
Jiedan Zhu
Vikas Chandra
Ozlem Kalinli
M. Seltzer
75
16
0
29 Mar 2022
Integrating Lattice-Free MMI into End-to-End Speech Recognition
Integrating Lattice-Free MMI into End-to-End Speech Recognition
Jinchuan Tian
Jianwei Yu
Chao Weng
Yuexian Zou
Dong Yu
106
8
0
29 Mar 2022
Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition
  With Emformer
Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition With Emformer
J. Sun
Guiping Zhong
Dinghao Zhou
Baoxiang Li
108
0
0
29 Mar 2022
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
Binbin Zhang
Di Wu
Zhendong Peng
Xingcheng Song
Zhuoyuan Yao
Hang Lv
Linfu Xie
Chao Yang
Fuping Pan
Jianwei Niu
VLM
104
99
0
29 Mar 2022
Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain
  Data
Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain Data
Chen Chen
Nana Hou
Yuchen Hu
Shashank Shirol
Chng Eng Siong
NoLa
103
43
0
29 Mar 2022
Filler Word Detection and Classification: A Dataset and Benchmark
Filler Word Detection and Classification: A Dataset and Benchmark
Ge Zhu
Juan-Pablo Caceres
Justin Salamon
39
9
0
28 Mar 2022
Previous
123...8910...192021
Next