ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXiv (abs)PDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 1,048 papers shown
Title
MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding
MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding
Yu Xi
Haoyu Li
Xiaoyu Gu
Yidi Jiang
Kai Yu
82
1
0
01 Jul 2025
State-Space Models in Efficient Whispered and Multi-dialect Speech Recognition
State-Space Models in Efficient Whispered and Multi-dialect Speech Recognition
Aref Farhadipour
Homayoon Beigi
Volker Dellwo
H. Veisi
Mamba
24
0
0
20 Jun 2025
AsyncSwitch: Asynchronous Text-Speech Adaptation for Code-Switched ASR
AsyncSwitch: Asynchronous Text-Speech Adaptation for Code-Switched ASR
Tuan Nguyen
Huy-Dat Tran
26
0
0
17 Jun 2025
Qwen vs. Gemma Integration with Whisper: A Comparative Study in Multilingual SpeechLLM Systems
Qwen vs. Gemma Integration with Whisper: A Comparative Study in Multilingual SpeechLLM Systems
Tuan Nguyen
Long-Vu Hoang
Huy-Dat Tran
24
0
0
16 Jun 2025
SplashNet: Split-and-Share Encoders for Accurate and Efficient Typing with Surface Electromyography
SplashNet: Split-and-Share Encoders for Accurate and Efficient Typing with Surface Electromyography
Nima Hadidi
Jason Chan
Ebrahim Feghhi
Jonathan C. Kao
29
0
0
14 Jun 2025
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
Tony Alex
S. Ahmed
A. Mustafa
Muhammad Awais
Philip J. B. Jackson
26
1
0
13 Jun 2025
From Sharpness to Better Generalization for Speech Deepfake Detection
From Sharpness to Better Generalization for Speech Deepfake Detection
Wen-Chin Huang
Xuechen Liu
Xin Eric Wang
Junichi Yamagishi
Yanmin Qian
33
0
0
13 Jun 2025
Regularizing Learnable Feature Extraction for Automatic Speech Recognition
Regularizing Learnable Feature Extraction for Automatic Speech Recognition
Peter Vieting
Maximilian Kannen
Benedikt Hilmes
Ralf Schluter
Hermann Ney
AAML
84
0
0
11 Jun 2025
Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
Asahi Sakuma
Hiroaki Sato
Ryuga Sugano
Tadashi Kumano
Yoshihiko Kawai
Tetsuji Ogawa
25
0
0
09 Jun 2025
Curvature Enhanced Data Augmentation for Regression
Curvature Enhanced Data Augmentation for Regression
Ilya Kaufman Sirot
Omri Azencot
31
0
0
07 Jun 2025
Acoustically Precise Hesitation Tagging Is Essential for End-to-End Verbatim Transcription Systems
Acoustically Precise Hesitation Tagging Is Essential for End-to-End Verbatim Transcription Systems
Jhen-Ke Lin
Hao-Chien Lu
Chung-Chun Wang
Hong-Yun Lin
Berlin Chen
48
0
0
04 Jun 2025
HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation
HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation
A. Hussein
Cihan Xiao
Matthew Wiesner
Dan Povey
Leibny Paola García
Sanjeev Khudanpur
24
0
0
02 Jun 2025
WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing
WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing
Yu Nakagome
Michael Hentschel
60
0
0
02 Jun 2025
Learning More with Less: Self-Supervised Approaches for Low-Resource Speech Emotion Recognition
Learning More with Less: Self-Supervised Approaches for Low-Resource Speech Emotion Recognition
Ziwei Gong
Pengyuan Shi
Kaan Donbekci
Lin Ai
Run Chen
David Sasu
Zehui Wu
Julia Hirschberg
SSL
32
0
0
01 Jun 2025
The iNaturalist Sounds Dataset
The iNaturalist Sounds Dataset
Mustafa Chasmai
Alexander Shepard
Subhransu Maji
Grant Van Horn
42
2
0
31 May 2025
Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification
Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification
Badr M. Abdullah
Matthew Baas
Bernd Möbius
Dietrich Klakow
17
0
0
30 May 2025
MSDA: Combining Pseudo-labeling and Self-Supervision for Unsupervised Domain Adaptation in ASR
MSDA: Combining Pseudo-labeling and Self-Supervision for Unsupervised Domain Adaptation in ASR
Dimitrios Damianos
Georgios Paraskevopoulos
Alexandros Potamianos
69
0
0
30 May 2025
Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC
Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC
Qingzheng Wang
Jiancheng Sun
Yifan Peng
Shinji Watanabe
89
0
0
30 May 2025
Masked Self-distilled Transducer-based Keyword Spotting with Semi-autoregressive Decoding
Masked Self-distilled Transducer-based Keyword Spotting with Semi-autoregressive Decoding
Yu Xi
Xiaoyu Gu
Haoyu Li
Jun Song
Bo Zheng
Kai Yu
33
0
0
30 May 2025
Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation
Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation
Zhennan Lin
Kaixun Huang
Wei Ren
Linju Yang
Lei Xie
AI4CE
52
0
0
29 May 2025
ZIPA: A family of efficient models for multilingual phone recognition
ZIPA: A family of efficient models for multilingual phone recognition
Jian Zhu
Farhan Samir
Eleanor Chodroff
David R. Mortensen
56
0
0
29 May 2025
Evaluation of LLMs in Speech is Often Flawed: Test Set Contamination in Large Language Models for Speech Recognition
Evaluation of LLMs in Speech is Often Flawed: Test Set Contamination in Large Language Models for Speech Recognition
Yuan Tseng
Titouan Parcollet
Rogier van Dalen
Shucong Zhang
Sourav Bhattacharya
64
0
0
28 May 2025
FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian
FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian
Sara Papi
Marco Gaido
L. Bentivogli
Alessio Brutti
Mauro Cettolo
R. Gretter
M. Matassoni
Mohamed Nabih
Matteo Negri
44
0
0
28 May 2025
An Effective Training Framework for Light-Weight Automatic Speech Recognition Models
An Effective Training Framework for Light-Weight Automatic Speech Recognition Models
Abdul Hannan
Alessio Brutti
Shah Nawaz
Mubashir Noman
71
0
0
22 May 2025
MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention Fusion
MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention Fusion
Wei Hua
Chenlin Zhou
Jibin Wu
Yansong Chua
Yangyang Shu
116
0
0
19 May 2025
Beyond Identity: A Generalizable Approach for Deepfake Audio Detection
Beyond Identity: A Generalizable Approach for Deepfake Audio Detection
Yasaman Ahmadiadli
Xiao-Ping Zhang
Naimul Khan
158
0
0
10 May 2025
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
Young-Hu Park
R.-H. Park
Hyung-Min Park
146
0
0
07 May 2025
Dysarthria Normalization via Local Lie Group Transformations for Robust ASR
Dysarthria Normalization via Local Lie Group Transformations for Robust ASR
Mikhail Osipov
134
1
0
16 Apr 2025
Formula-Supervised Sound Event Detection: Pre-Training Without Real Data
Formula-Supervised Sound Event Detection: Pre-Training Without Real Data
Yuto Shibata
Keitaro Tanaka
Yoshiaki Bando
Keisuke Imoto
Hirokatsu Kataoka
Yoshimitsu Aoki
68
0
0
06 Apr 2025
Improving Acoustic Scene Classification with City Features
Improving Acoustic Scene Classification with City Features
Yiqiang Cai
Yizhou Tan
Peihong Zhang
Yuxuan Liu
Shengchen Li
68
0
0
21 Mar 2025
CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking
CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking
Yiming Li
Kaiying Yan
Shuo Shao
Tongqing Zhai
Shu-Tao Xia
Zhan Qin
D. Tao
AAML
368
0
0
02 Mar 2025
DIN-CTS: Low-Complexity Depthwise-Inception Neural Network with Contrastive Training Strategy for Deepfake Speech Detection
DIN-CTS: Low-Complexity Depthwise-Inception Neural Network with Contrastive Training Strategy for Deepfake Speech Detection
L. D. Pham
Dat Tran
Florian Skopik
Alexander Schindler
Silvia Poletti
Fischinger David
Martin Boyer
Martin Boyer
95
1
0
27 Feb 2025
SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
Khanh Le
Tuan Vu Ho
Dung Tran
Duc Thanh Chau
109
0
0
20 Feb 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Soundwave: Less is More for Speech-Text Alignment in LLMs
Yunke Zhang
Zhiheng Liu
Fan Bu
Ruiyu Zhang
Benyou Wang
Haoyang Li
AuLLMSyDaVLM
172
1
0
18 Feb 2025
CR-CTC: Consistency regularization on CTC for improved speech recognition
CR-CTC: Consistency regularization on CTC for improved speech recognition
Zengwei Yao
Wei Kang
Xiaoyu Yang
Fangjun Kuang
Liyong Guo
Han Zhu
Zengrui Jin
Zhaoqing Li
Long Lin
Daniel Povey
134
4
0
17 Feb 2025
Measuring Diversity in Synthetic Datasets
Measuring Diversity in Synthetic Datasets
Yuchang Zhu
Huizhe Zhang
Bingzhe Wu
Jintang Li
Zibin Zheng
Peilin Zhao
Liang Chen
Yatao Bian
138
0
0
12 Feb 2025
Privacy-Preserving Dataset Combination
Privacy-Preserving Dataset Combination
Keren Fuentes
Mimee Xu
Irene Chen
116
0
0
09 Feb 2025
Sagalee: an Open Source Automatic Speech Recognition Dataset for Oromo Language
Sagalee: an Open Source Automatic Speech Recognition Dataset for Oromo Language
Turi Abu
Ying Shi
Tianshi Zheng
D. Wang
93
0
0
01 Feb 2025
Variational Bayesian Adaptive Learning of Deep Latent Variables for Acoustic Knowledge Transfer
Hu Hu
Sabato Marco Siniscalchi
Chao-Han Huck Yang
Chin-Hui Lee
123
0
0
28 Jan 2025
Unifying Prediction and Explanation in Time-Series Transformers via Shapley-based Pretraining
Qisen Cheng
Jinming Xing
Chang Xue
Xiaoran Yang
AI4TS
108
6
0
28 Jan 2025
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
Kai-Tuo Xu
Feng-Long Xie
Xu Tang
Feng-Long Xie
162
5
0
24 Jan 2025
Noise-Agnostic Multitask Whisper Training for Reducing False Alarm Errors in Call-for-Help Detection
Noise-Agnostic Multitask Whisper Training for Reducing False Alarm Errors in Call-for-Help Detection
Myeonghoon Ryu
June-Woo Kim
Minseok Oh
Suji Lee
Han Park
125
0
0
20 Jan 2025
A Non-autoregressive Model for Joint STT and TTS
A Non-autoregressive Model for Joint STT and TTS
Vishal Sunder
Brian Kingsbury
G. Saon
Samuel Thomas
Slava Shechtman Hagai Aronowitz
Hagai Aronowitz
Eric Fosler-Lussier
Luis A. Lastras
157
0
0
15 Jan 2025
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Tsz Kin Lam
Marco Gaido
Sara Papi
L. Bentivogli
Barry Haddow
128
0
0
04 Jan 2025
Fotheidil: an Automatic Transcription System for the Irish Language
Liam Lonergan
Ibon Saratxaga
John Sloan
Oscar Maharog
Mengjie Qian
Neasa Ní Chiaráin
Christer Gobl
A. N. Chasaide
82
0
0
03 Jan 2025
autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
Simon Rampp
Andreas Triantafyllopoulos
M. Milling
Björn Schuller
270
0
0
16 Dec 2024
PSELDNets: Pre-trained Neural Networks on a Large-scale Synthetic Dataset for Sound Event Localization and Detection
PSELDNets: Pre-trained Neural Networks on a Large-scale Synthetic Dataset for Sound Event Localization and Detection
Jinbo Hu
Yin Cao
Ming Wu
Fang Kang
Feiran Yang
Wenwu Wang
Mark D. Plumbley
J. Yang
72
1
0
10 Nov 2024
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Dennis Fucci
Marco Gaido
Beatrice Savoldi
Matteo Negri
Mauro Cettolo
L. Bentivogli
274
3
0
03 Nov 2024
Music Foundation Model as Generic Booster for Music Downstream Tasks
Music Foundation Model as Generic Booster for Music Downstream Tasks
Weihsiang Liao
Yuhta Takida
Yukara Ikemiya
Zhi-Wei Zhong
Chieh-Hsin Lai
...
Stefan Uhlich
Taketo Akama
Woosung Choi
Yuichiro Koyama
Yuki Mitsufuji
237
1
0
02 Nov 2024
emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography
emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography
Viswanath Sivakumar
Jeffrey Seely
Alan Du
Sean R Bittner
Adam Berenzweig
Anuoluwapo Bolarinwa
Alexandre Gramfort
Michael I Mandel
84
6
0
26 Oct 2024
1234...192021
Next