Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.08779
Cited By
v1
v2
v3 (latest)
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"
50 / 1,048 papers shown
Title
MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding
Yu Xi
Haoyu Li
Xiaoyu Gu
Yidi Jiang
Kai Yu
82
1
0
01 Jul 2025
State-Space Models in Efficient Whispered and Multi-dialect Speech Recognition
Aref Farhadipour
Homayoon Beigi
Volker Dellwo
H. Veisi
Mamba
24
0
0
20 Jun 2025
AsyncSwitch: Asynchronous Text-Speech Adaptation for Code-Switched ASR
Tuan Nguyen
Huy-Dat Tran
26
0
0
17 Jun 2025
Qwen vs. Gemma Integration with Whisper: A Comparative Study in Multilingual SpeechLLM Systems
Tuan Nguyen
Long-Vu Hoang
Huy-Dat Tran
24
0
0
16 Jun 2025
SplashNet: Split-and-Share Encoders for Accurate and Efficient Typing with Surface Electromyography
Nima Hadidi
Jason Chan
Ebrahim Feghhi
Jonathan C. Kao
29
0
0
14 Jun 2025
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
Tony Alex
S. Ahmed
A. Mustafa
Muhammad Awais
Philip J. B. Jackson
26
1
0
13 Jun 2025
From Sharpness to Better Generalization for Speech Deepfake Detection
Wen-Chin Huang
Xuechen Liu
Xin Eric Wang
Junichi Yamagishi
Yanmin Qian
33
0
0
13 Jun 2025
Regularizing Learnable Feature Extraction for Automatic Speech Recognition
Peter Vieting
Maximilian Kannen
Benedikt Hilmes
Ralf Schluter
Hermann Ney
AAML
84
0
0
11 Jun 2025
Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
Asahi Sakuma
Hiroaki Sato
Ryuga Sugano
Tadashi Kumano
Yoshihiko Kawai
Tetsuji Ogawa
25
0
0
09 Jun 2025
Curvature Enhanced Data Augmentation for Regression
Ilya Kaufman Sirot
Omri Azencot
31
0
0
07 Jun 2025
Acoustically Precise Hesitation Tagging Is Essential for End-to-End Verbatim Transcription Systems
Jhen-Ke Lin
Hao-Chien Lu
Chung-Chun Wang
Hong-Yun Lin
Berlin Chen
48
0
0
04 Jun 2025
HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation
A. Hussein
Cihan Xiao
Matthew Wiesner
Dan Povey
Leibny Paola García
Sanjeev Khudanpur
24
0
0
02 Jun 2025
WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing
Yu Nakagome
Michael Hentschel
60
0
0
02 Jun 2025
Learning More with Less: Self-Supervised Approaches for Low-Resource Speech Emotion Recognition
Ziwei Gong
Pengyuan Shi
Kaan Donbekci
Lin Ai
Run Chen
David Sasu
Zehui Wu
Julia Hirschberg
SSL
32
0
0
01 Jun 2025
The iNaturalist Sounds Dataset
Mustafa Chasmai
Alexander Shepard
Subhransu Maji
Grant Van Horn
42
2
0
31 May 2025
Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification
Badr M. Abdullah
Matthew Baas
Bernd Möbius
Dietrich Klakow
17
0
0
30 May 2025
MSDA: Combining Pseudo-labeling and Self-Supervision for Unsupervised Domain Adaptation in ASR
Dimitrios Damianos
Georgios Paraskevopoulos
Alexandros Potamianos
69
0
0
30 May 2025
Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC
Qingzheng Wang
Jiancheng Sun
Yifan Peng
Shinji Watanabe
89
0
0
30 May 2025
Masked Self-distilled Transducer-based Keyword Spotting with Semi-autoregressive Decoding
Yu Xi
Xiaoyu Gu
Haoyu Li
Jun Song
Bo Zheng
Kai Yu
33
0
0
30 May 2025
Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation
Zhennan Lin
Kaixun Huang
Wei Ren
Linju Yang
Lei Xie
AI4CE
52
0
0
29 May 2025
ZIPA: A family of efficient models for multilingual phone recognition
Jian Zhu
Farhan Samir
Eleanor Chodroff
David R. Mortensen
56
0
0
29 May 2025
Evaluation of LLMs in Speech is Often Flawed: Test Set Contamination in Large Language Models for Speech Recognition
Yuan Tseng
Titouan Parcollet
Rogier van Dalen
Shucong Zhang
Sourav Bhattacharya
64
0
0
28 May 2025
FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian
Sara Papi
Marco Gaido
L. Bentivogli
Alessio Brutti
Mauro Cettolo
R. Gretter
M. Matassoni
Mohamed Nabih
Matteo Negri
44
0
0
28 May 2025
An Effective Training Framework for Light-Weight Automatic Speech Recognition Models
Abdul Hannan
Alessio Brutti
Shah Nawaz
Mubashir Noman
71
0
0
22 May 2025
MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention Fusion
Wei Hua
Chenlin Zhou
Jibin Wu
Yansong Chua
Yangyang Shu
116
0
0
19 May 2025
Beyond Identity: A Generalizable Approach for Deepfake Audio Detection
Yasaman Ahmadiadli
Xiao-Ping Zhang
Naimul Khan
158
0
0
10 May 2025
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
Young-Hu Park
R.-H. Park
Hyung-Min Park
146
0
0
07 May 2025
Dysarthria Normalization via Local Lie Group Transformations for Robust ASR
Mikhail Osipov
134
1
0
16 Apr 2025
Formula-Supervised Sound Event Detection: Pre-Training Without Real Data
Yuto Shibata
Keitaro Tanaka
Yoshiaki Bando
Keisuke Imoto
Hirokatsu Kataoka
Yoshimitsu Aoki
68
0
0
06 Apr 2025
Improving Acoustic Scene Classification with City Features
Yiqiang Cai
Yizhou Tan
Peihong Zhang
Yuxuan Liu
Shengchen Li
68
0
0
21 Mar 2025
CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking
Yiming Li
Kaiying Yan
Shuo Shao
Tongqing Zhai
Shu-Tao Xia
Zhan Qin
D. Tao
AAML
368
0
0
02 Mar 2025
DIN-CTS: Low-Complexity Depthwise-Inception Neural Network with Contrastive Training Strategy for Deepfake Speech Detection
L. D. Pham
Dat Tran
Florian Skopik
Alexander Schindler
Silvia Poletti
Fischinger David
Martin Boyer
Martin Boyer
95
1
0
27 Feb 2025
SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
Khanh Le
Tuan Vu Ho
Dung Tran
Duc Thanh Chau
109
0
0
20 Feb 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Yunke Zhang
Zhiheng Liu
Fan Bu
Ruiyu Zhang
Benyou Wang
Haoyang Li
AuLLM
SyDa
VLM
172
1
0
18 Feb 2025
CR-CTC: Consistency regularization on CTC for improved speech recognition
Zengwei Yao
Wei Kang
Xiaoyu Yang
Fangjun Kuang
Liyong Guo
Han Zhu
Zengrui Jin
Zhaoqing Li
Long Lin
Daniel Povey
134
4
0
17 Feb 2025
Measuring Diversity in Synthetic Datasets
Yuchang Zhu
Huizhe Zhang
Bingzhe Wu
Jintang Li
Zibin Zheng
Peilin Zhao
Liang Chen
Yatao Bian
138
0
0
12 Feb 2025
Privacy-Preserving Dataset Combination
Keren Fuentes
Mimee Xu
Irene Chen
116
0
0
09 Feb 2025
Sagalee: an Open Source Automatic Speech Recognition Dataset for Oromo Language
Turi Abu
Ying Shi
Tianshi Zheng
D. Wang
93
0
0
01 Feb 2025
Variational Bayesian Adaptive Learning of Deep Latent Variables for Acoustic Knowledge Transfer
Hu Hu
Sabato Marco Siniscalchi
Chao-Han Huck Yang
Chin-Hui Lee
123
0
0
28 Jan 2025
Unifying Prediction and Explanation in Time-Series Transformers via Shapley-based Pretraining
Qisen Cheng
Jinming Xing
Chang Xue
Xiaoran Yang
AI4TS
108
6
0
28 Jan 2025
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
Kai-Tuo Xu
Feng-Long Xie
Xu Tang
Feng-Long Xie
162
5
0
24 Jan 2025
Noise-Agnostic Multitask Whisper Training for Reducing False Alarm Errors in Call-for-Help Detection
Myeonghoon Ryu
June-Woo Kim
Minseok Oh
Suji Lee
Han Park
125
0
0
20 Jan 2025
A Non-autoregressive Model for Joint STT and TTS
Vishal Sunder
Brian Kingsbury
G. Saon
Samuel Thomas
Slava Shechtman Hagai Aronowitz
Hagai Aronowitz
Eric Fosler-Lussier
Luis A. Lastras
157
0
0
15 Jan 2025
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Tsz Kin Lam
Marco Gaido
Sara Papi
L. Bentivogli
Barry Haddow
128
0
0
04 Jan 2025
Fotheidil: an Automatic Transcription System for the Irish Language
Liam Lonergan
Ibon Saratxaga
John Sloan
Oscar Maharog
Mengjie Qian
Neasa Ní Chiaráin
Christer Gobl
A. N. Chasaide
82
0
0
03 Jan 2025
autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
Simon Rampp
Andreas Triantafyllopoulos
M. Milling
Björn Schuller
270
0
0
16 Dec 2024
PSELDNets: Pre-trained Neural Networks on a Large-scale Synthetic Dataset for Sound Event Localization and Detection
Jinbo Hu
Yin Cao
Ming Wu
Fang Kang
Feiran Yang
Wenwu Wang
Mark D. Plumbley
J. Yang
72
1
0
10 Nov 2024
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Dennis Fucci
Marco Gaido
Beatrice Savoldi
Matteo Negri
Mauro Cettolo
L. Bentivogli
274
3
0
03 Nov 2024
Music Foundation Model as Generic Booster for Music Downstream Tasks
Weihsiang Liao
Yuhta Takida
Yukara Ikemiya
Zhi-Wei Zhong
Chieh-Hsin Lai
...
Stefan Uhlich
Taketo Akama
Woosung Choi
Yuichiro Koyama
Yuki Mitsufuji
237
1
0
02 Nov 2024
emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography
Viswanath Sivakumar
Jeffrey Seely
Alan Du
Sean R Bittner
Adam Berenzweig
Anuoluwapo Bolarinwa
Alexandre Gramfort
Michael I Mandel
84
6
0
26 Oct 2024
1
2
3
4
...
19
20
21
Next