ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXivPDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 741 papers shown
Title
An Overview of Indian Spoken Language Recognition from Machine Learning
  Perspective
An Overview of Indian Spoken Language Recognition from Machine Learning Perspective
Spandan Dey
Md. Sahidullah
G. Saha
33
20
0
30 Nov 2022
MSV Challenge 2022: NPU-HC Speaker Verification System for Low-resource
  Indian Languages
MSV Challenge 2022: NPU-HC Speaker Verification System for Low-resource Indian Languages
Yue Li
Li Zhang
Na Wang
Jie Liu
Linfu Xie
41
0
0
30 Nov 2022
Interpretability Analysis of Deep Models for COVID-19 Detection
Interpretability Analysis of Deep Models for COVID-19 Detection
Daniel Peixoto Pinto da Silva
Edresson Casanova
L. Gris
A. Júnior
Marcelo Finger
...
Beatriz Raposo
Marcus Martins
S. Aluísio
L. Berti
João Paulo Teixeira
21
3
0
25 Nov 2022
Self-Transriber: Few-shot Lyrics Transcription with Self-training
Self-Transriber: Few-shot Lyrics Transcription with Self-training
Xiaoxue Gao
Xianghu Yue
Haizhou Li
30
7
0
18 Nov 2022
Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy
  Environments
Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments
Dominik Wagner
Ilja Baumann
Sebastian P. Bayerl
Korbinian Riedhammer
Tobias Bocklet
47
2
0
16 Nov 2022
Improving Children's Speech Recognition by Fine-tuning Self-supervised
  Adult Speech Representations
Improving Children's Speech Recognition by Fine-tuning Self-supervised Adult Speech Representations
Renée Lu
M. Shahin
Beena Ahmed
35
4
0
14 Nov 2022
Low Pass Filtering and Bandwidth Extension for Robust Anti-spoofing
  Countermeasure Against Codec Variabilities
Low Pass Filtering and Bandwidth Extension for Robust Anti-spoofing Countermeasure Against Codec Variabilities
Yikang Wang
Xingming Wang
Hiromitsu Nishizaki
Ming Li
27
6
0
12 Nov 2022
Continuous Soft Pseudo-Labeling in ASR
Continuous Soft Pseudo-Labeling in ASR
Tatiana Likhomanenko
R. Collobert
Navdeep Jaitly
Samy Bengio
VLM
29
3
0
11 Nov 2022
A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models
  for Spoken Language Understanding
A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding
Yifan Peng
Siddhant Arora
Yosuke Higuchi
Yushi Ueda
Sujay S. Kumar
Karthik Ganesan
Siddharth Dalmia
Xuankai Chang
Shinji Watanabe
32
20
0
10 Nov 2022
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture,
  and Generalization Capabilities
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities
Andros Tjandra
Nayan Singhal
David C. Zhang
Ozlem Kalinli
Abdel-rahman Mohamed
Duc Le
M. Seltzer
42
12
0
10 Nov 2022
Self-supervised learning of audio representations using angular
  contrastive loss
Self-supervised learning of audio representations using angular contrastive loss
Shanshan Wang
S. Tripathy
A. Mesaros
SSL
29
4
0
10 Nov 2022
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge
  Distillation
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
Florian Schmid
Khaled Koutini
Gerhard Widmer
ViT
28
58
0
09 Nov 2022
Improving Noisy Student Training on Non-target Domain Data for Automatic
  Speech Recognition
Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition
Yu Chen
Wen Ding
Junjie Lai
37
8
0
09 Nov 2022
Pushing the limits of self-supervised speaker verification using
  regularized distillation framework
Pushing the limits of self-supervised speaker verification using regularized distillation framework
Yafeng Chen
Siqi Zheng
Haibo Wang
Luyao Cheng
Qian Chen
25
25
0
08 Nov 2022
High-resolution embedding extractor for speaker diarisation
High-resolution embedding extractor for speaker diarisation
Hee-Soo Heo
Youngki Kwon
Bong-Jin Lee
You Jin Kim
Jee-weon Jung
36
5
0
08 Nov 2022
Breaking the trade-off in personalized speech enhancement with
  cross-task knowledge distillation
Breaking the trade-off in personalized speech enhancement with cross-task knowledge distillation
H. Taherian
Sefik Emre Eskimez
Takuya Yoshioka
29
1
0
05 Nov 2022
Improved Techniques for the Conditional Generative Augmentation of
  Clinical Audio Data
Improved Techniques for the Conditional Generative Augmentation of Clinical Audio Data
Mane Margaryan
Matthias Seibold
Indu Joshi
Mazda Farshad
Philipp Fürnstahl
Nassir Navab
MedIm
24
2
0
05 Nov 2022
Biased Self-supervised learning for ASR
Biased Self-supervised learning for ASR
Florian Kreyssig
Yangyang Shi
Jinxi Guo
Leda Sari
Abdel-rahman Mohamed
P. Woodland
SSL
37
2
0
04 Nov 2022
Dynamic Kernels and Channel Attention for Low Resource Speaker
  Verification
Dynamic Kernels and Channel Attention for Low Resource Speaker Verification
A. Ollerenshaw
Md. Asif Jalal
Thomas Hain
19
0
0
03 Nov 2022
Probing Statistical Representations For End-To-End ASR
Probing Statistical Representations For End-To-End ASR
A. Ollerenshaw
Md. Asif Jalal
Thomas Hain
35
2
0
03 Nov 2022
The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge
  (ICSRC): Dataset, Tracks, Baseline and Results
The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results
Ao Zhang
F. Yu
Kaixun Huang
Linfu Xie
Longbiao Wang
Eng Siong Chng
Hui Bu
Binbin Zhang
Wei Chen
Xin Xu
34
4
0
03 Nov 2022
Monolingual Recognizers Fusion for Code-switching Speech Recognition
Monolingual Recognizers Fusion for Code-switching Speech Recognition
Tongtong Song
Qiang Xu
Haoyu Lu
Longbiao Wang
Hao Shi
Yuqin Lin
Yanbing Yang
J. Dang
27
4
0
02 Nov 2022
Factorized Blank Thresholding for Improved Runtime Efficiency of Neural
  Transducers
Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers
Duc Le
Frank Seide
Yuhao Wang
Heng Chang
Kjell Schubert
Ozlem Kalinli
M. Seltzer
19
6
0
02 Nov 2022
Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022
Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022
Zhengyang Chen
Bing Han
Xu Xiang
Houjun Huang
Bei Liu
Y. Qian
32
13
0
02 Nov 2022
InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss
InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss
Yosuke Higuchi
Tetsuji Ogawa
Tetsunori Kobayashi
Shinji Watanabe
32
0
0
02 Nov 2022
Unified End-to-End Speech Recognition and Endpointing for Fast and
  Efficient Speech Systems
Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems
Shaan Bijwadia
Shuo-yiin Chang
Yue Liu
Tara N. Sainath
Chaoyang Zhang
Yanzhang He
47
7
0
01 Nov 2022
Joint Audio/Text Training for Transformer Rescorer of Streaming Speech
  Recognition
Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition
Suyoun Kim
Ke Li
Lucas Kabela
Rongqing Huang
Jiedan Zhu
Ozlem Kalinli
Duc Le
35
8
0
31 Oct 2022
Fast and parallel decoding for transducer
Fast and parallel decoding for transducer
Wei Kang
Liyong Guo
Fangjun Kuang
Long Lin
Mingshuang Luo
Zengwei Yao
Xiaoyu Yang
Piotr Żelasko
Daniel Povey
AI4TS
29
15
0
31 Oct 2022
Delay-penalized transducer for low-latency streaming ASR
Delay-penalized transducer for low-latency streaming ASR
Wei Kang
Zengwei Yao
Fangjun Kuang
Liyong Guo
Xiaoyu Yang
Long lin
Piotr Żelasko
Daniel Povey
30
6
0
31 Oct 2022
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge
  Distillation
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation
Liyong Guo
Xiaoyu Yang
Quandong Wang
Yuxiang Kong
Zengwei Yao
...
Wei Kang
Long Lin
Mingshuang Luo
Piotr Żelasko
Daniel Povey
VLM
43
7
0
31 Oct 2022
Structured State Space Decoder for Speech Recognition and Synthesis
Structured State Space Decoder for Speech Recognition and Synthesis
Koichi Miyazaki
Masato Murata
Tomoki Koriyama
39
12
0
31 Oct 2022
Wespeaker: A Research and Production oriented Speaker Embedding Learning
  Toolkit
Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit
Hongji Wang
Che-Yuan Liang
Shuai Wang
Zhengyang Chen
Binbin Zhang
Xu Xiang
Yan Deng
Y. Qian
35
118
0
31 Oct 2022
Speaker Representation Learning via Contrastive Loss with Maximal
  Speaker Separability
Speaker Representation Learning via Contrastive Loss with Maximal Speaker Separability
Zhe Li
Man-Wai Mak
SSL
31
6
0
29 Oct 2022
Discriminative Speaker Representation via Contrastive Learning with
  Class-Aware Attention in Angular Space
Discriminative Speaker Representation via Contrastive Learning with Class-Aware Attention in Angular Space
Zhe Li
Man-Wai Mak
Helen M. Meng
39
9
0
29 Oct 2022
End-to-end Spoken Language Understanding with Tree-constrained Pointer
  Generator
End-to-end Spoken Language Understanding with Tree-constrained Pointer Generator
Guangzhi Sun
Chuxu Zhang
P. Woodland
35
8
0
29 Oct 2022
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Xubo Liu
Qiushi Huang
Xinhao Mei
Haohe Liu
Qiuqiang Kong
...
Yu Zhang
Lilian H. Y. Tang
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
53
18
0
28 Oct 2022
SG-VAD: Stochastic Gates Based Speech Activity Detection
SG-VAD: Stochastic Gates Based Speech Activity Detection
Jonathan Svirsky
Ofir Lindenbaum
49
4
0
28 Oct 2022
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for
  Multimodal Sentiment Analysis
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Atsushi Ando
Ryo Masumura
Akihiko Takashima
Satoshi Suzuki
Naoki Makishima
Keita Suzuki
Takafumi Moriya
Takanori Ashihara
Hiroshi Sato
44
9
0
28 Oct 2022
A Compact End-to-End Model with Local and Global Context for Spoken
  Language Identification
A Compact End-to-End Model with Local and Global Context for Spoken Language Identification
Fei Jia
Nithin Rao Koluguri
Jagadeesh Balam
Boris Ginsburg
33
3
0
27 Oct 2022
Make More of Your Data: Minimal Effort Data Augmentation for Automatic
  Speech Recognition and Translation
Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation
Tsz Kin Lam
Shigehiko Schamoni
Stefan Riezler
VLM
48
8
0
27 Oct 2022
A knowledge-driven vowel-based approach of depression classification
  from speech using data augmentation
A knowledge-driven vowel-based approach of depression classification from speech using data augmentation
Kexin Feng
Theodora Chaspari
19
6
0
27 Oct 2022
Iterative pseudo-forced alignment by acoustic CTC loss for
  self-supervised ASR domain adaptation
Iterative pseudo-forced alignment by acoustic CTC loss for self-supervised ASR domain adaptation
F. López
Jordi Luque
14
6
0
27 Oct 2022
Training Autoregressive Speech Recognition Models with Limited in-domain
  Supervision
Training Autoregressive Speech Recognition Models with Limited in-domain Supervision
Chak-Fai Li
Francis Keith
William Hartmann
M. Snover
24
0
0
27 Oct 2022
Monotonic segmental attention for automatic speech recognition
Monotonic segmental attention for automatic speech recognition
Albert Zeyer
Robin Schmitt
Wei Zhou
Ralf Schluter
Hermann Ney
27
8
0
26 Oct 2022
TSUP Speaker Diarization System for Conversational Short-phrase Speaker
  Diarization Challenge
TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge
Bowen Pang
Huan Zhao
Gaosheng Zhang
Xiaoyue Yang
Yanguo Sun
Li Zhang
Qing Wang
Linfu Xie
BDL
28
2
0
26 Oct 2022
Reducing Language confusion for Code-switching Speech Recognition with
  Token-level Language Diarization
Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization
Hexin Liu
Haihua Xu
Leibny Paola García
Andy W. H. Khong
Yi He
Sanjeev Khudanpur
27
24
0
26 Oct 2022
UFO2: A unified pre-training framework for online and offline speech
  recognition
UFO2: A unified pre-training framework for online and offline speech recognition
Li Fu
Siqi Li
Qingtao Li
L. Deng
Fangzhu Li
Lu Fan
Meng Chen
Xiaodong He
OffRL
37
8
0
26 Oct 2022
AVES: Animal Vocalization Encoder based on Self-Supervision
AVES: Animal Vocalization Encoder based on Self-Supervision
Masato Hagiwara
CLIP
VLM
AI4TS
19
24
0
26 Oct 2022
G-Augment: Searching for the Meta-Structure of Data Augmentation
  Policies for ASR
G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR
Gary Wang
Ekin D.Cubuk
Andrew Rosenberg
Shuyang Cheng
Ron J. Weiss
Bhuvana Ramabhadran
Pedro J. Moreno
Quoc V. Le
Daniel S. Park
35
1
0
19 Oct 2022
Maestro-U: Leveraging joint speech-text representation learning for zero
  supervised speech ASR
Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Zhehuai Chen
Ankur Bapna
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Pedro J. Moreno
Nanxin Chen
51
17
0
18 Oct 2022
Previous
123456...131415
Next