ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXiv (abs)PDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 1,048 papers shown
Title
Learning to Detect Novel and Fine-Grained Acoustic Sequences Using
  Pretrained Audio Representations
Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations
Vasudha Kowtha
Miquel Espi Marques
Jonathan Huang
Yichi Zhang
C. Avendaño
AI4TS
77
0
0
03 May 2023
Self-supervised learning for infant cry analysis
Self-supervised learning for infant cry analysis
Arsenii Gorin
Cem Subakan
Sajjad Abdoli
Junhao Wang
Samantha Latremouille
Charles C. Onu
62
10
0
02 May 2023
When Deep Learning Meets Polyhedral Theory: A Survey
When Deep Learning Meets Polyhedral Theory: A Survey
Joey Huchette
Gonzalo Muñoz
Thiago Serra
Calvin Tsay
AI4CE
162
37
0
29 Apr 2023
Non-autoregressive End-to-end Approaches for Joint Automatic Speech
  Recognition and Spoken Language Understanding
Non-autoregressive End-to-end Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding
Mohan Li
R. Doddipatla
70
7
0
21 Apr 2023
DropDim: A Regularization Method for Transformer Networks
DropDim: A Regularization Method for Transformer Networks
Hao Zhang
Dan Qu
Kejia Shao
Xu Yang
79
12
0
20 Apr 2023
Improving Speech Translation by Cross-Modal Multi-Grained Contrastive
  Learning
Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning
Hao Zhang
Nianwen Si
Yaqi Chen
Wenlin Zhang
Xukui Yang
Dan Qu
Weiqiang Zhang
81
10
0
20 Apr 2023
Affective social anthropomorphic intelligent system
Affective social anthropomorphic intelligent system
Md. Adyelullahil Mamun
Hasnat Md. Abdullah
Md. Golam Rabiul Alam
Muhammad Mehedi Hassan
Md. Zia Uddin
54
1
0
19 Apr 2023
Wav2code: Restore Clean Speech Representations via Codebook Lookup for
  Noise-Robust ASR
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Yuchen Hu
Cheng Chen
Qiu-shi Zhu
Eng Siong Chng
130
16
0
11 Apr 2023
Efficient Audio Captioning Transformer with Patchout and Text Guidance
Efficient Audio Captioning Transformer with Patchout and Text Guidance
Thodoris Kouzelis
Grigoris Bastas
Athanasios Katsamanis
Alexandros Potamianos
ViT
88
6
0
06 Apr 2023
Lego-Features: Exporting modular encoder features for streaming and
  deliberation ASR
Lego-Features: Exporting modular encoder features for streaming and deliberation ASR
Rami Botros
Rohit Prabhavalkar
J. Schalkwyk
Ciprian Chelba
Tara N. Sainath
Franccoise Beaufays
AuLLM
60
3
0
31 Mar 2023
Practical Conformer: Optimizing size, speed and flops of Conformer for
  on-Device and cloud ASR
Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR
Rami Botros
Anmol Gulati
Tara N. Sainath
K. Choromanski
Ruoming Pang
Trevor Strohman
Weiran Wang
Jiahui Yu
MQ
83
3
0
31 Mar 2023
Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low
  Resource Languages
Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages
Seong-Hyun Park
Myungseo Song
Bohyung Kim
Tae-Hyun Oh
40
1
0
28 Mar 2023
Towards Diverse and Coherent Augmentation for Time-Series Forecasting
Towards Diverse and Coherent Augmentation for Time-Series Forecasting
Xiyuan Zhang
Ranak Roy Chowdhury
Jingbo Shang
Rajesh K. Gupta
Dezhi Hong
AI4TS
64
5
0
24 Mar 2023
Beyond Universal Transformer: block reusing with adaptor in Transformer
  for automatic speech recognition
Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition
Haoyu Tang
Zhaoyi Liu
Chang Zeng
Xinfeng Li
58
1
0
23 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
171
48
0
21 Mar 2023
Exploring Representation Learning for Small-Footprint Keyword Spotting
Exploring Representation Learning for Small-Footprint Keyword Spotting
Fan Cui
Liyong Guo
Quandong Wang
Peng Gao
Yujun Wang
SSL
96
3
0
20 Mar 2023
Enhancing Unsupervised Audio Representation Learning via Adversarial
  Sample Generation
Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation
Yulin Pan
Xiangteng He
Biao Gong
Yuxin Peng
Yiliang Lv
SSL
51
0
0
15 Mar 2023
Improving Accented Speech Recognition with Multi-Domain Training
Improving Accented Speech Recognition with Multi-Domain Training
Lucas Maison
Yannick Esteve
79
9
0
14 Mar 2023
Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy
Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy
Xulong Zhang
Haobin Tang
Jianzong Wang
Ning Cheng
Jian Luo
Jing Xiao
61
2
0
14 Mar 2023
Robust Knowledge Distillation from RNN-T Models With Noisy Training
  Labels Using Full-Sum Loss
Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss
Mohammad Zeineldeen
Kartik Audhkhasi
M. Baskar
Bhuvana Ramabhadran
79
3
0
10 Mar 2023
An Inception-Residual-Based Architecture with Multi-Objective Loss for
  Detecting Respiratory Anomalies
An Inception-Residual-Based Architecture with Multi-Objective Loss for Detecting Respiratory Anomalies
Dat Ngo
L. D. Pham
Huy P Phan
Minh Tran
D. Jarchi
Ş. Kolozali
60
3
0
07 Mar 2023
Unified Keyword Spotting and Audio Tagging on Mobile Devices with
  Transformers
Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
69
4
0
03 Mar 2023
N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses
  and Constrained Decoding Space
N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space
Rao Ma
Mark Gales
Kate Knill
Mengjie Qian
82
33
0
01 Mar 2023
HalluAudio: Hallucinating Frequency as Concepts for Few-Shot Audio
  Classification
HalluAudio: Hallucinating Frequency as Concepts for Few-Shot Audio Classification
Zhongjie Yu
Shuyang Wang
Lin Chen
Zhongwei Cheng
VLM
60
3
0
27 Feb 2023
Text-only domain adaptation for end-to-end ASR using integrated
  text-to-mel-spectrogram generator
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
Vladimir Bataev
Roman Korostik
Evgeny Shabalin
Vitaly Lavrukhin
Boris Ginsburg
VLM
89
15
0
27 Feb 2023
Out-of-Domain Robustness via Targeted Augmentations
Out-of-Domain Robustness via Targeted Augmentations
Irena Gao
Shiori Sagawa
Pang Wei Koh
Tatsunori Hashimoto
Percy Liang
OODDOOD
75
23
0
23 Feb 2023
Cross-modal Audio-visual Co-learning for Text-independent Speaker
  Verification
Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification
Meng Liu
Kong Aik Lee
Longbiao Wang
Hanyi Zhang
Chang Zeng
Jianwu Dang
83
10
0
22 Feb 2023
Front-End Adapter: Adapting Front-End Input of Speech based
  Self-Supervised Learning for Speech Recognition
Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition
Xie Chen
Ziyang Ma
Changli Tang
Yujin Wang
Zhi-shen Zheng
57
4
0
18 Feb 2023
FrAug: Frequency Domain Augmentation for Time Series Forecasting
FrAug: Frequency Domain Augmentation for Time Series Forecasting
Mu-Hwa Chen
Zhijian Xu
Ailing Zeng
Qiang Xu
AI4TS
82
10
0
18 Feb 2023
JEIT: Joint End-to-End Model and Internal Language Model Training for
  Speech Recognition
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition
Zhong Meng
Weiran Wang
Rohit Prabhavalkar
Tara N. Sainath
Tongzhou Chen
Ehsan Variani
Yu Zhang
Yue Liu
Andrew Rosenberg
Bhuvana Ramabhadran
AuLLMVLM
96
11
0
16 Feb 2023
Improving Spoken Language Identification with Map-Mix
Improving Spoken Language Identification with Map-Mix
Shangeth Rajaa
K. Anandan
Swaraj Dalmia
Tarun Gupta
Chng Eng Siong
46
1
0
16 Feb 2023
Personalized Audio Quality Preference Prediction
Personalized Audio Quality Preference Prediction
Chung-Che Wang
Yu-Chun Lin
Yu-Teng Hsu
J. Jang
56
1
0
16 Feb 2023
A dataset for Audio-Visual Sound Event Detection in Movies
A dataset for Audio-Visual Sound Event Detection in Movies
Rajat Hebbar
Digbalay Bose
Krishna Somandepalli
Veena Vijai
Shrikanth Narayanan
58
9
0
14 Feb 2023
Complex Dynamic Neurons Improved Spiking Transformer Network for
  Efficient Automatic Speech Recognition
Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition
Minglun Han
Qingyu Wang
Tielin Zhang
Yi Wang
Duzhen Zhang
Bo Xu
80
29
0
02 Feb 2023
Epic-Sounds: A Large-scale Dataset of Actions That Sound
Epic-Sounds: A Large-scale Dataset of Actions That Sound
Jaesung Huh
Jacob Chalk
Evangelos Kazakos
Dima Damen
Andrew Zisserman
EgoV
100
43
0
01 Feb 2023
Knowledge Transfer from Pre-trained Language Models to Cif-based Speech
  Recognizers via Hierarchical Distillation
Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation
Minglun Han
Feilong Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
100
13
0
30 Jan 2023
Pre-training for Speech Translation: CTC Meets Optimal Transport
Pre-training for Speech Translation: CTC Meets Optimal Transport
Hang Le
Hongyu Gong
Changhan Wang
J. Pino
Benjamin Lecouteux
D. Schwab
OT
104
26
0
27 Jan 2023
Zorro: the masked multimodal transformer
Zorro: the masked multimodal transformer
Adrià Recasens
Jason Lin
João Carreira
Drew Jaegle
Luyu Wang
...
Pauline Luc
Antoine Miech
Lucas Smaira
Ross Hemsley
Andrew Zisserman
92
21
0
23 Jan 2023
Leveraging Speaker Embeddings with Adversarial Multi-task Learning for
  Age Group Classification
Leveraging Speaker Embeddings with Adversarial Multi-task Learning for Age Group Classification
Kwangje Baeg
Yeong-Gwan Kim
Youngsub Han
Byoung-Ki Jeon
59
0
0
22 Jan 2023
Training one model to detect heart and lung sound events from single
  point auscultations
Training one model to detect heart and lung sound events from single point auscultations
Leander Melms
Robert R. Ilesan
Ulrich Köhler
O. Hildebrandt
R. Conradt
...
Jürgen R. Schaefer
Tobias Müller
J. Obergassel
Nadine Schlicker
M. Hirsch
88
2
0
15 Jan 2023
Learning Audio-Driven Viseme Dynamics for 3D Face Animation
Learning Audio-Driven Viseme Dynamics for 3D Face Animation
Linchao Bao
Haoxian Zhang
Yue Qian
Tangli Xue
Changan Chen
Xuefei Zhe
Di Kang
3DH
63
12
0
15 Jan 2023
Automated speech- and text-based classification of neuropsychiatric
  conditions in a multidiagnostic setting
Automated speech- and text-based classification of neuropsychiatric conditions in a multidiagnostic setting
L. Hansen
R. Rocca
A. Simonsen
A. Parola
V. Bliksted
...
Dan Bang
Kristian Tylén
Ethan Weed
S. Ostergaard
Riccardo Fusaroli
102
3
0
13 Jan 2023
Dual Learning for Large Vocabulary On-Device ASR
Dual Learning for Large Vocabulary On-Device ASR
Cal Peyser
Ronny Huang
Tara N. Sainath
Rohit Prabhavalkar
M. Picheny
K. Cho
SSL
58
1
0
11 Jan 2023
Generative Emotional AI for Speech Emotion Recognition: The Case for
  Synthetic Emotional Speech Augmentation
Generative Emotional AI for Speech Emotion Recognition: The Case for Synthetic Emotional Speech Augmentation
Abdullah Shahid
S. Latif
Junaid Qadir
64
23
0
10 Jan 2023
Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition
  Systems A case study for Modern Greek
Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems A case study for Modern Greek
Georgios Paraskevopoulos
Theodoros Kouzelis
Georgios Rouvalis
Athanasios Katsamanis
Vassilis Katsouros
Alexandros Potamianos
VLM
72
9
0
31 Dec 2022
Macro-block dropout for improved regularization in training end-to-end
  speech recognition models
Macro-block dropout for improved regularization in training end-to-end speech recognition models
Chanwoo Kim
Sathish Indurti
Jinhwan Park
Wonyong Sung
61
0
0
29 Dec 2022
Pushing the performances of ASR models on English and Spanish accents
Pushing the performances of ASR models on English and Spanish accents
Pooja Chitkara
M. Rivière
Jade Copet
Frank Zhang
Yatharth Saraf
53
0
0
22 Dec 2022
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for
  Universal and Generalized Speech Enhancement
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Wei-Ning Hsu
Tal Remez
Bowen Shi
Jacob Donley
Yossi Adi
DiffM
93
12
0
21 Dec 2022
4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict
  decoders
4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
Yui Sudo
Muhammad Shakeel
Brian Yan
Jiatong Shi
Shinji Watanabe
49
10
0
21 Dec 2022
Joint Speech Transcription and Translation: Pseudo-Labeling with
  Out-of-Distribution Data
Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data
Mozhdeh Gheini
Tatiana Likhomanenko
Matthias Sperber
Hendra Setiawan
90
5
0
20 Dec 2022
Previous
12345...192021
Next