ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXiv (abs)PDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 1,048 papers shown
Title
Unified Microphone Conversion: Many-to-Many Device Mapping via Feature-wise Linear Modulation
Unified Microphone Conversion: Many-to-Many Device Mapping via Feature-wise Linear Modulation
Myeonghoon Ryu
Hongseok Oh
Suji Lee
Han Park
81
0
0
23 Oct 2024
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
T. Pham
Tri Ton
Chang D. Yoo
105
3
0
03 Oct 2024
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Sreyan Ghosh
Sonal Kumar
Zhifeng Kong
Rafael Valle
Bryan Catanzaro
Dinesh Manocha
DiffM
129
3
0
02 Oct 2024
The Conformer Encoder May Reverse the Time Dimension
The Conformer Encoder May Reverse the Time Dimension
Robin Schmitt
Albert Zeyer
Mohammad Zeineldeen
Ralf Schluter
Hermann Ney
91
0
0
01 Oct 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
131
5
0
23 Sep 2024
MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
Khai-Nguyen Nguyen
Phuc Phan
Tan-Hanh Pham
Bach Phan Tat
Minh-Huong Ngo
Chris Ngo
Thanh Nguyen-Tang
Truong-Son Hy
LM&MA
101
0
0
21 Sep 2024
ASR Error Correction using Large Language Models
ASR Error Correction using Large Language Models
Rao Ma
Mengjie Qian
Mark Gales
Kate Knill
KELM
138
6
0
14 Sep 2024
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders
Wentao Zhang
Shuo Sun
Bin Wang
Xunlong Zou
Zhuohan Liu
Yingxu He
Geyu Lin
Nancy F. Chen
Ai Ti Aw
AuLLM
121
1
0
10 Sep 2024
SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
Md Awsafur Rahman
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
Bishmoy Paul
S. Fattah
175
11
0
26 Aug 2024
Sentence-wise Speech Summarization: Task, Datasets, and End-to-End
  Modeling with LM Knowledge Distillation
Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation
Kohei Matsuura
Takanori Ashihara
Takafumi Moriya
Masato Mimura
Takatomo Kano
A. Ogawa
Marc Delcroix
66
2
0
01 Aug 2024
Beat this! Accurate beat tracking without DBN postprocessing
Beat this! Accurate beat tracking without DBN postprocessing
Francesco Foscarin
Jan Schluter
Gerhard Widmer
74
7
0
31 Jul 2024
dMel: Speech Tokenization made Simple
dMel: Speech Tokenization made Simple
Richard He Bai
Tatiana Likhomanenko
Ruixiang Zhang
Zijin Gu
Zakaria Aldeneh
Navdeep Jaitly
113
6
0
22 Jul 2024
Improving Neural Biasing for Contextual Speech Recognition by Early
  Context Injection and Text Perturbation
Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation
Ruizhe Huang
M. Yarmohammadi
Sanjeev Khudanpur
Dan Povey
109
3
0
14 Jul 2024
Multitaper mel-spectrograms for keyword spotting
Multitaper mel-spectrograms for keyword spotting
Douglas Baptista de Souza
Khaled Jamal Bakri
Fernanda Ferreira
Juliana Inacio
45
1
0
05 Jul 2024
Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End
  Multi-Accent Speech Recognition
Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition
Jinming Chen
Jingyi Fang
Yuanzhong Zheng
Yaoxuan Wang
Haojun Fei
73
1
0
03 Jul 2024
AnoPatch: Towards Better Consistency in Machine Anomalous Sound
  Detection
AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection
Anbai Jiang
Bing Han
Zhiqiang Lv
Yufeng Deng
Wei-Qiang Zhang
Xie Chen
Yanmin Qian
Jia Liu
Pingyi Fan
66
3
0
17 Jun 2024
Self-Distillation Prototypes Network: Learning Robust Speaker
  Representations without Supervision
Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
Yafeng Chen
Siqi Zheng
Hui Wang
Luyao Cheng
Qian Chen
Shiliang Zhang
Wen Wang
SSL
61
4
0
17 Jun 2024
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
Yifan Yang
Zheshu Song
Jianheng Zhuo
Mingyu Cui
Jinpeng Li
...
Shuai Fan
Kai Yu
Wei Zhang
Guoguo Chen
Xie Chen
142
12
0
17 Jun 2024
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text
  Interleaving
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
Bhavani Shankar
Preethi Jyothi
Pushpak Bhattacharyya
95
1
0
16 Jun 2024
Contrastive Learning from Synthetic Audio Doppelgängers
Contrastive Learning from Synthetic Audio Doppelgängers
Manuel Cherep
Nikhil Singh
116
1
0
09 Jun 2024
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Saierdaer Yusuyin
Te Ma
Hao Huang
Wenbo Zhao
Zhijian Ou
121
4
0
04 Jun 2024
A Comprehensive Survey on Data Augmentation
A Comprehensive Survey on Data Augmentation
Zaitian Wang
Pengfei Wang
Kunpeng Liu
Pengyang Wang
Yanjie Fu
Chang-Tien Lu
Charu Aggarwal
Jian Pei
Yuanchun Zhou
ViT
175
28
0
15 May 2024
Sonos Voice Control Bias Assessment Dataset: A Methodology for
  Demographic Bias Assessment in Voice Assistants
Sonos Voice Control Bias Assessment Dataset: A Methodology for Demographic Bias Assessment in Voice Assistants
Chloe Sekkat
Fanny Leroy
Salima Mdhaffar
Blake Perry Smith
Yannick Esteve
Joseph Dureau
A. Coucke
49
1
0
14 May 2024
Audio Anti-Spoofing Detection: A Survey
Audio Anti-Spoofing Detection: A Survey
Menglu Li
Yasaman Ahmadiadli
Xiao-Ping Zhang
104
25
0
22 Apr 2024
AudioRepInceptionNeXt: A lightweight single-stream architecture for
  efficient audio recognition
AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition
Kin Wai Lau
Yasar Abbas Ur Rehman
L. Po
85
1
0
21 Apr 2024
A Large-Scale Evaluation of Speech Foundation Models
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
104
27
0
15 Apr 2024
Guided Masked Self-Distillation Modeling for Distributed Multimedia
  Sensor Event Analysis
Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis
Masahiro Yasuda
Noboru Harada
Yasunori Ohishi
Shoichiro Saito
Akira Nakayama
Nobutaka Ono
94
4
0
12 Apr 2024
VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain
VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain
Khai-Nguyen Nguyen
LM&MA
92
10
0
08 Apr 2024
Exploration of Adapter for Noise Robust Automatic Speech Recognition
Exploration of Adapter for Noise Robust Automatic Speech Recognition
Hao Shi
Tatsuya Kawahara
84
5
0
28 Feb 2024
The Effect of Batch Size on Contrastive Self-Supervised Speech
  Representation Learning
The Effect of Batch Size on Contrastive Self-Supervised Speech Representation Learning
Nik Vaessen
David A. van Leeuwen
96
3
0
21 Feb 2024
Understanding Test-Time Augmentation
Understanding Test-Time Augmentation
Masanari Kimura
ViT
78
30
0
10 Feb 2024
Exploring Missing Modality in Multimodal Egocentric Datasets
Exploring Missing Modality in Multimodal Egocentric Datasets
Merey Ramazanova
Alejandro Pardo
Humam Alwassel
Guohao Li
EgoV
80
4
0
21 Jan 2024
Can Synthetic Data Boost the Training of Deep Acoustic Vehicle Counting
  Networks?
Can Synthetic Data Boost the Training of Deep Acoustic Vehicle Counting Networks?
Stefano Damiano
Luca Bondi
Shabnam Ghaffarzadegan
Andre Guntoro
Toon van Waterschoot
45
7
0
17 Jan 2024
Efficient Adapter Finetuning for Tail Languages in Streaming
  Multilingual ASR
Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR
Junwen Bai
Yue Liu
Qiujia Li
Tara N. Sainath
Trevor Strohman
108
3
0
17 Jan 2024
Promptformer: Prompted Conformer Transducer for ASR
Promptformer: Prompted Conformer Transducer for ASR
Sergio Duarte Torres
Arunasish Sen
Aman Rana
Lukas Drude
Alejandro Gomez-Alanis
Andreas Schwarz
Leif Rädel
Volker Leutnant
76
3
0
14 Jan 2024
Microphone Conversion: Mitigating Device Variability in Sound Event
  Classification
Microphone Conversion: Mitigating Device Variability in Sound Event Classification
Myeonghoon Ryu
Hongseok Oh
Suji Lee
Han Park
72
4
0
12 Jan 2024
AugSumm: towards generalizable speech summarization using synthetic
  labels from large language model
AugSumm: towards generalizable speech summarization using synthetic labels from large language model
Jee-weon Jung
Roshan S. Sharma
William Chen
Bhiksha Raj
Shinji Watanabe
77
4
0
10 Jan 2024
Generative linguistic representation for spoken language identification
Generative linguistic representation for spoken language identification
Peng Shen
Xuguang Lu
Hisashi Kawai
39
0
0
18 Dec 2023
NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for
  Speaker Verification
NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification
Hyunjun Heo
U.H Shin
Ran Lee
YoungJu Cheon
Hyung-Min Park
55
12
0
14 Dec 2023
Graph Convolutions Enrich the Self-Attention in Transformers!
Graph Convolutions Enrich the Self-Attention in Transformers!
Jeongwhan Choi
Hyowon Wi
Jayoung Kim
Yehjin Shin
Kookjin Lee
Nathaniel Trask
Noseong Park
112
5
0
07 Dec 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
153
20
0
27 Nov 2023
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech
  Translation
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation
Juan Pablo Zuluaga
Zhaocheng Huang
Xing Niu
Rohit Paturi
S. Srinivasan
Prashant Mathur
Brian Thompson
Marcello Federico
BDL
76
2
0
01 Nov 2023
Deep Neural Networks for Automatic Speaker Recognition Do Not Learn
  Supra-Segmental Temporal Features
Deep Neural Networks for Automatic Speaker Recognition Do Not Learn Supra-Segmental Temporal Features
Daniel Neururer
Volker Dellwo
Thilo Stadelmann
68
2
0
01 Nov 2023
MixRep: Hidden Representation Mixup for Low-Resource Speech Recognition
MixRep: Hidden Representation Mixup for Low-Resource Speech Recognition
Jiamin Xie
John H. L. Hansen
39
3
0
27 Oct 2023
Data Augmentation for Time-Series Classification: An Extensive Empirical
  Study and Comprehensive Survey
Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey
Zijun Gao
Lingbo Li
AI4TS
105
9
0
16 Oct 2023
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
Cheol Jun Cho
Abdelrahman Mohamed
Shang-Wen Li
Alan W. Black
Gopala K. Anumanchipalli
112
9
0
16 Oct 2023
Audio compression-assisted feature extraction for voice replay attack
  detection
Audio compression-assisted feature extraction for voice replay attack detection
Xiangyu Shi
Yuhao Luo
Li Wang
Haorui He
Hao Li
Lei Wang
Zhizheng Wu
61
0
0
09 Oct 2023
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Dongchao Yang
Jinchuan Tian
Xuejiao Tan
Rongjie Huang
Songxiang Liu
...
Jiang Bian
Xixin Wu
Zhou Zhao
Shinji Watanabe
Helen M. Meng
CVBMAuLLM
135
128
0
01 Oct 2023
Improving Audio Captioning Models with Fine-grained Audio Features, Text
  Embedding Supervision, and LLM Mix-up Augmentation
Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
Shih-Lun Wu
Xuankai Chang
Gordon Wichern
Jee-weon Jung
Franccois G. Germain
Jonathan Le Roux
Shinji Watanabe
83
20
0
29 Sep 2023
Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping
Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping
Martin Pelikan
Sheikh Shams Azam
Vitaly Feldman
Jan Honza Silovsky
Kunal Talwar
Christopher G. Brinton
Tatiana Likhomanenko
113
8
0
29 Sep 2023
Previous
12345...192021
Next