ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXiv (abs)PDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 1,048 papers shown
Title
YFACC: A Yorùbá speech-image dataset for cross-lingual keyword
  localisation through visual grounding
YFACC: A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual grounding
Kayode Olaleye
Dan Oneaţă
Herman Kamper
ObjD
82
8
0
10 Oct 2022
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild
Xuechen Liu
Xin Wang
Md. Sahidullah
J. Patino
Héctor Delgado
...
Massimiliano Todisco
Junichi Yamagishi
Nicholas W. D. Evans
A. Nautsch
Kong Aik Lee
116
194
0
05 Oct 2022
Learning Temporal Resolution in Spectrogram for Audio Classification
Learning Temporal Resolution in Spectrogram for Audio Classification
Haohe Liu
Xubo Liu
Qiuqiang Kong
Wenwu Wang
Mark D. Plumbley
84
7
0
04 Oct 2022
E-Branchformer: Branchformer with Enhanced merging for speech
  recognition
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
181
117
0
30 Sep 2022
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers
  for Streaming Speech Recognition
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition
Martin H. Radfar
Rohit Barnwal
Rupak Vignesh Swaminathan
Feng-Ju Chang
Grant P. Strimel
Nathan Susanj
Athanasios Mouchtaris
109
14
0
29 Sep 2022
Audio Retrieval with WavText5K and CLAP Training
Audio Retrieval with WavText5K and CLAP Training
Soham Deshmukh
Benjamin Elizalde
Huaming Wang
3DVCLIP
181
53
0
28 Sep 2022
Direct Speech Translation for Automatic Subtitling
Direct Speech Translation for Automatic Subtitling
Sara Papi
Marco Gaido
Alina Karakanta
Mauro Cettolo
Matteo Negri
Marco Turchi
108
11
0
27 Sep 2022
Unsupervised domain adaptation for speech recognition with unsupervised
  error correction
Unsupervised domain adaptation for speech recognition with unsupervised error correction
Long Mai
Julie Carson-Berndsen
105
8
0
24 Sep 2022
Relaxed Attention for Transformer Models
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
59
12
0
20 Sep 2022
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for
  End-to-End Speech Recognition
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition
Ye Bai
Jie Li
W. Han
Hao Ni
Kaituo Xu
Zhuo Zhang
Cheng Yi
Xiaorui Wang
MoE
61
2
0
17 Sep 2022
Self-Supervised Attention Networks and Uncertainty Loss Weighting for
  Multi-Task Emotion Recognition on Vocal Bursts
Self-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal Bursts
Vincent Karas
Andreas Triantafyllopoulos
Meishu Song
Björn W. Schuller
67
4
0
15 Sep 2022
I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra
  Contrastive Regularization
I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization
Dianwen Ng
J. Yip
Tanmay Surana
Zhao Yang
Chong Zhang
Yukun Ma
Chongjia Ni
Chng Eng Siong
B. Ma
91
6
0
14 Sep 2022
Streaming End-to-End Multilingual Speech Recognition with Joint Language
  Identification
Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Chuxu Zhang
Yue Liu
Tara N. Sainath
Trevor Strohman
S. Mavandadi
Shuo-yiin Chang
Parisa Haghani
205
30
0
13 Sep 2022
Learning ASR pathways: A sparse multilingual ASR model
Learning ASR pathways: A sparse multilingual ASR model
Mu Yang
Andros Tjandra
Chunxi Liu
David C. Zhang
Duc Le
Ozlem Kalinli
94
14
0
13 Sep 2022
Non-autoregressive Error Correction for CTC-based ASR with
  Phone-conditioned Masked LM
Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM
Hayato Futami
Hirofumi Inaguma
Sei Ueno
Masato Mimura
S. Sakai
Tatsuya Kawahara
KELM
130
13
0
08 Sep 2022
Sound Event Localization and Detection for Real Spatial Sound Scenes:
  Event-Independent Network and Data Augmentation Chains
Sound Event Localization and Detection for Real Spatial Sound Scenes: Event-Independent Network and Data Augmentation Chains
Jinbo Hu
Yin Cao
Ming Wu
Qiuqiang Kong
Feiran Yang
Mark D. Plumbley
J. Yang
96
10
0
05 Sep 2022
Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by
  Human Speech Perception
Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
Jiadong Wang
Xinyuan Qian
Haizhou Li
68
14
0
05 Sep 2022
Equivariant Self-Supervision for Musical Tempo Estimation
Equivariant Self-Supervision for Musical Tempo Estimation
Elio Quinton
92
9
0
03 Sep 2022
Training Strategies for Improved Lip-reading
Training Strategies for Improved Lip-reading
Pingchuan Ma
Yujiang Wang
Stavros Petridis
Jie Shen
Maja Pantic
133
49
0
03 Sep 2022
Random Text Perturbations Work, but not Always
Random Text Perturbations Work, but not Always
Zhengxiang Wang
DeLMO
47
1
0
02 Sep 2022
Attention Enhanced Citrinet for Speech Recognition
Attention Enhanced Citrinet for Speech Recognition
Xianchao Wu
82
1
0
01 Sep 2022
Deep Sparse Conformer for Speech Recognition
Deep Sparse Conformer for Speech Recognition
Xianchao Wu
43
2
0
01 Sep 2022
Robust Sound-Guided Image Manipulation
Robust Sound-Guided Image Manipulation
Seung Hyun Lee
Gyeongrok Oh
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
DiffM
97
7
0
30 Aug 2022
Improved Zero-Shot Audio Tagging & Classification with Patchout
  Spectrogram Transformers
Improved Zero-Shot Audio Tagging & Classification with Patchout Spectrogram Transformers
Paul Primus
Gerhard Widmer
VLM
112
5
0
24 Aug 2022
A differentiable short-time Fourier transform with respect to the window
  length
A differentiable short-time Fourier transform with respect to the window length
Maxime Leiber
Axel Barrau
Y. Marnissi
D. Abboud
52
9
0
23 Aug 2022
A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function
  Perspective
A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective
Chanwoo Park
Sangdoo Yun
Sanghyuk Chun
AAML
83
32
0
21 Aug 2022
Disentangled Speaker Representation Learning via Mutual Information
  Minimization
Disentangled Speaker Representation Learning via Mutual Information Minimization
Sung Hwan Mun
Mingrui Han
Minchan Kim
Dongjune Lee
N. Kim
DRL
97
11
0
17 Aug 2022
Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End
  Speech Recognition
Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition
A. Andrusenko
R. Nasretdinov
A. Romanenko
85
18
0
16 Aug 2022
An investigation on selecting audio pre-trained models for audio
  captioning
An investigation on selecting audio pre-trained models for audio captioning
Peiran Yan
Sheng-Wei Li
58
0
0
12 Aug 2022
A High-Quality and Large-Scale Dataset for English-Vietnamese Speech
  Translation
A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation
L. T. Nguyen
Nguyen Luong Tran
Long Doan
Manh Luong
Dat Quoc Nguyen
62
4
0
08 Aug 2022
SSDPT: Self-Supervised Dual-Path Transformer for Anomalous Sound
  Detection in Machine Condition Monitoring
SSDPT: Self-Supervised Dual-Path Transformer for Anomalous Sound Detection in Machine Condition Monitoring
Jisheng Bai
Jianfeng Chen
Mou Wang
Muhammad Saad Ayub
Qingli Yan
88
16
0
06 Aug 2022
Pronunciation-aware unique character encoding for RNN Transducer-based
  Mandarin speech recognition
Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition
Peng Shen
Xugang Lu
Hisashi Kawai
39
2
0
29 Jul 2022
Learning a Dual-Mode Speech Recognition Model via Self-Pruning
Learning a Dual-Mode Speech Recognition Model via Self-Pruning
Chunxi Liu
Yuan Shangguan
Haichuan Yang
Yangyang Shi
Raghuraman Krishnamoorthi
Ozlem Kalinli
SSL
87
7
0
25 Jul 2022
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Xiaoming Ren
Huifeng Zhu
Liuwei Wei
Minghui Wu
Jie Hao
108
10
0
24 Jul 2022
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Grant Van Horn
Rui Qian
Kimberly Wilber
Hartwig Adam
Oisin Mac Aodha
Serge Belongie
103
10
0
21 Jul 2022
When Is TTS Augmentation Through a Pivot Language Useful?
When Is TTS Augmentation Through a Pivot Language Useful?
Nathaniel R. Robinson
Perez Ogayo
Swetha Gangu
David R. Mortensen
Shinji Watanabe
84
10
0
20 Jul 2022
Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription
Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription
Longshen Ou
Xiangming Gu
Ye Wang
77
24
0
20 Jul 2022
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech
  Recognition at Production Scale
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Gopinath Chennupati
Milind Rao
Gurpreet Chadha
Aaron Eakin
A. Raju
...
Andrew Oberlin
Buddha Nandanoor
Prahalad Venkataramanan
Zheng Wu
Pankaj Sitpure
CLL
95
8
0
19 Jul 2022
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision
  and Language Models
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Rui Qian
Yeqing Li
Zheng Xu
Ming-Hsuan Yang
Serge Belongie
Huayu Chen
VLM
74
22
0
15 Jul 2022
Knowledge Transfer and Distillation from Autoregressive to
  Non-Autoregressive Speech Recognition
Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition
Xun Gong
Zhikai Zhou
Y. Qian
138
4
0
15 Jul 2022
Two-Pass Low Latency End-to-End Spoken Language Understanding
Two-Pass Low Latency End-to-End Spoken Language Understanding
Siddhant Arora
Siddharth Dalmia
Xuankai Chang
Brian Yan
A. Black
Shinji Watanabe
VLM
109
19
0
14 Jul 2022
Masked Autoencoders that Listen
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
148
290
0
13 Jul 2022
MM-ALT: A Multimodal Automatic Lyric Transcription System
MM-ALT: A Multimodal Automatic Lyric Transcription System
Xiangming Gu
Longshen Ou
Danielle Ong
Ye Wang
83
13
0
13 Jul 2022
Multitask Learning from Augmented Auxiliary Data for Improving Speech
  Emotion Recognition
Multitask Learning from Augmented Auxiliary Data for Improving Speech Emotion Recognition
S. Latif
R. Rana
Sara Khalifa
Raja Jurdak
Björn W. Schuller
72
23
0
12 Jul 2022
pMCT: Patched Multi-Condition Training for Robust Speech Recognition
pMCT: Patched Multi-Condition Training for Robust Speech Recognition
Pablo Peso Parada
A. Dobrowolska
Karthikeyan P. Saravanan
Mete Ozay
97
6
0
11 Jul 2022
Intermediate-layer output Regularization for Attention-based Speech
  Recognition with Shared Decoder
Intermediate-layer output Regularization for Attention-based Speech Recognition with Shared Decoder
Jicheng Zhang
Yizhou Peng
Haihua Xu
Yi He
Chng Eng Siong
Hao-Ming Huang
AuLLM
73
6
0
09 Jul 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and
  Global Context for Speech Recognition and Understanding
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
96
151
0
06 Jul 2022
DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition
DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition
Jiamin Xie
John H. L. Hansen
36
1
0
04 Jul 2022
Leveraging Acoustic Contextual Representation by Audio-textual
  Cross-modal Learning for Conversational ASR
Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR
Kun Wei
Yike Zhang
Sining Sun
Lei Xie
Long Ma
62
9
0
03 Jul 2022
Improving Transformer-based Conversational ASR by Inter-Sentential
  Attention Mechanism
Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism
Kun Wei
Pengcheng Guo
Ning Jiang
84
11
0
02 Jul 2022
Previous
123...678...192021
Next