ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXivPDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 734 papers shown
Title
Affective social anthropomorphic intelligent system
Affective social anthropomorphic intelligent system
Md. Adyelullahil Mamun
Hasnat Md. Abdullah
Md. Golam Rabiul Alam
Muhammad Mehedi Hassan
Md. Zia Uddin
19
1
0
19 Apr 2023
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming
  Conformer ASR
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR
Xilai Li
Goeric Huybrechts
S. Ronanki
Jeffrey J. Farris
S. Bodapati
40
6
0
18 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual
  Cross-Modal Pairs for Audiovisual Representation Learning
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
30
2
0
12 Apr 2023
Margin-Mixup: A Method for Robust Speaker Verification in Multi-Speaker
  Audio
Margin-Mixup: A Method for Robust Speaker Verification in Multi-Speaker Audio
Jenthe Thienpondt
N. Madhu
Kris Demuynck
32
4
0
07 Apr 2023
Efficient Audio Captioning Transformer with Patchout and Text Guidance
Efficient Audio Captioning Transformer with Patchout and Text Guidance
Thodoris Kouzelis
Grigoris Bastas
Athanasios Katsamanis
Alexandros Potamianos
ViT
32
6
0
06 Apr 2023
Practical Conformer: Optimizing size, speed and flops of Conformer for
  on-Device and cloud ASR
Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR
Rami Botros
Anmol Gulati
Tara N. Sainath
K. Choromanski
Ruoming Pang
Trevor Strohman
Weiran Wang
Jiahui Yu
MQ
28
3
0
31 Mar 2023
When Good and Reproducible Results are a Giant with Feet of Clay: The
  Importance of Software Quality in NLP
When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP
Sara Papi
Marco Gaido
Andrea Pilzer
Matteo Negri
59
10
0
28 Mar 2023
Towards Diverse and Coherent Augmentation for Time-Series Forecasting
Towards Diverse and Coherent Augmentation for Time-Series Forecasting
Xiyuan Zhang
Ranak Roy Chowdhury
Jingbo Shang
Rajesh K. Gupta
Dezhi Hong
AI4TS
32
4
0
24 Mar 2023
Beyond Universal Transformer: block reusing with adaptor in Transformer
  for automatic speech recognition
Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition
Haoyu Tang
Zhaoyi Liu
Chang Zeng
Xinfeng Li
34
1
0
23 Mar 2023
Exploring Turkish Speech Recognition via Hybrid CTC/Attention
  Architecture and Multi-feature Fusion Network
Exploring Turkish Speech Recognition via Hybrid CTC/Attention Architecture and Multi-feature Fusion Network
Zeyu Ren
Nurmemet Yolwas
Huiru Wang
Wushour Slamu
31
0
0
22 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
46
47
0
21 Mar 2023
Exploring Representation Learning for Small-Footprint Keyword Spotting
Exploring Representation Learning for Small-Footprint Keyword Spotting
Fan Cui
Liyong Guo
Quandong Wang
Peng Gao
Yujun Wang
SSL
22
3
0
20 Mar 2023
Relate auditory speech to EEG by shallow-deep attention-based network
Relate auditory speech to EEG by shallow-deep attention-based network
Fan Cui
Liyong Guo
Lang He
Jiyao Liu
Ercheng Pei
Yujun Wang
Dengyang Jiang
23
3
0
20 Mar 2023
Enhancing Unsupervised Audio Representation Learning via Adversarial
  Sample Generation
Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation
Yulin Pan
Xiangteng He
Biao Gong
Yuxin Peng
Yiliang Lv
SSL
24
0
0
15 Mar 2023
Improving Accented Speech Recognition with Multi-Domain Training
Improving Accented Speech Recognition with Multi-Domain Training
Lucas Maison
Yannick Esteve
26
7
0
14 Mar 2023
Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy
Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy
Xulong Zhang
Haobin Tang
Jianzong Wang
Ning Cheng
Jian Luo
Jing Xiao
30
2
0
14 Mar 2023
Robust Knowledge Distillation from RNN-T Models With Noisy Training
  Labels Using Full-Sum Loss
Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss
Mohammad Zeineldeen
Kartik Audhkhasi
M. Baskar
Bhuvana Ramabhadran
24
2
0
10 Mar 2023
An Inception-Residual-Based Architecture with Multi-Objective Loss for
  Detecting Respiratory Anomalies
An Inception-Residual-Based Architecture with Multi-Objective Loss for Detecting Respiratory Anomalies
Dat Ngo
L. D. Pham
Huy P Phan
Minh Tran
D. Jarchi
Ş. Kolozali
32
3
0
07 Mar 2023
The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the
  2021 MISP Challenge: Deep Analysis
The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis
Haoxu Wang
Ming Cheng
Qiang Fu
Ming Li
41
8
0
04 Mar 2023
Unified Keyword Spotting and Audio Tagging on Mobile Devices with
  Transformers
Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
41
4
0
03 Mar 2023
N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses
  and Constrained Decoding Space
N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space
Rao Ma
Mark Gales
Kate Knill
Mengjie Qian
11
32
0
01 Mar 2023
Text-only domain adaptation for end-to-end ASR using integrated
  text-to-mel-spectrogram generator
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
Vladimir Bataev
Roman Korostik
Evgeny Shabalin
Vitaly Lavrukhin
Boris Ginsburg
VLM
38
14
0
27 Feb 2023
Cross-modal Audio-visual Co-learning for Text-independent Speaker
  Verification
Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification
Meng Liu
Kong Aik Lee
Longbiao Wang
Hanyi Zhang
Chang Zeng
J. Dang
23
10
0
22 Feb 2023
Advancing Stuttering Detection via Data Augmentation, Class-Balanced
  Loss and Multi-Contextual Deep Learning
Advancing Stuttering Detection via Data Augmentation, Class-Balanced Loss and Multi-Contextual Deep Learning
S. A. Sheikh
Md. Sahidullah
F. Hirsch
Slim Ouni
24
16
0
21 Feb 2023
Front-End Adapter: Adapting Front-End Input of Speech based
  Self-Supervised Learning for Speech Recognition
Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition
Xie Chen
Ziyang Ma
Changli Tang
Yujin Wang
Zhi-shen Zheng
13
4
0
18 Feb 2023
JEIT: Joint End-to-End Model and Internal Language Model Training for
  Speech Recognition
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition
Zhong Meng
Weiran Wang
Rohit Prabhavalkar
Tara N. Sainath
Tongzhou Chen
Ehsan Variani
Yu Zhang
Bo-wen Li
Andrew Rosenberg
Bhuvana Ramabhadran
AuLLM
VLM
36
11
0
16 Feb 2023
Hardware-aware training for large-scale and diverse deep learning
  inference workloads using in-memory computing-based accelerators
Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators
Malte J. Rasch
C. Mackin
Manuel Le Gallo
An Chen
A. Fasoli
...
P. Narayanan
H. Tsai
G. Burr
Abu Sebastian
Vijay Narayanan
15
86
0
16 Feb 2023
An Attention-based Approach to Hierarchical Multi-label Music Instrument
  Classification
An Attention-based Approach to Hierarchical Multi-label Music Instrument Classification
Zhi-Wei Zhong
M. Hirano
Kazuki Shimada
Kazuya Tateishi
Shusuke Takahashi
Yuki Mitsufuji
23
12
0
16 Feb 2023
Personalized Audio Quality Preference Prediction
Personalized Audio Quality Preference Prediction
Chung-Che Wang
Yu-Chun Lin
Yu-Teng Hsu
J. Jang
22
1
0
16 Feb 2023
Confidence Score Based Speaker Adaptation of Conformer Speech
  Recognition Systems
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems
Jiajun Deng
Xurong Xie
Tianzi Wang
Mingyu Cui
Boyang Xue
Zengrui Jin
Guinan Li
Shujie Hu
Xunying Liu
26
5
0
15 Feb 2023
Multi-Source Contrastive Learning from Musical Audio
Multi-Source Contrastive Learning from Musical Audio
C. Garoufis
Athanasia Zlatintsi
Petros Maragos
34
6
0
14 Feb 2023
Cross-Corpora Spoken Language Identification with Domain Diversification
  and Generalization
Cross-Corpora Spoken Language Identification with Domain Diversification and Generalization
Spandan Dey
Md. Sahidullah
G. Saha
21
11
0
10 Feb 2023
Efficient Domain Adaptation for Speech Foundation Models
Efficient Domain Adaptation for Speech Foundation Models
Bo-wen Li
DongSeon Hwang
Zhouyuan Huo
Junwen Bai
Guru Prakash
...
K. Sim
Yu Zhang
Wei Han
Trevor Strohman
F. Beaufays
AI4CE
46
23
0
03 Feb 2023
Complex Dynamic Neurons Improved Spiking Transformer Network for
  Efficient Automatic Speech Recognition
Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition
Minglun Han
Qingyu Wang
Tielin Zhang
Yi Wang
Duzhen Zhang
Bo Xu
28
29
0
02 Feb 2023
Epic-Sounds: A Large-scale Dataset of Actions That Sound
Epic-Sounds: A Large-scale Dataset of Actions That Sound
Jaesung Huh
Jacob Chalk
Evangelos Kazakos
Dima Damen
Andrew Zisserman
EgoV
29
41
0
01 Feb 2023
Open Problems in Applied Deep Learning
Open Problems in Applied Deep Learning
M. Raissi
AI4CE
50
2
0
26 Jan 2023
Leveraging Speaker Embeddings with Adversarial Multi-task Learning for
  Age Group Classification
Leveraging Speaker Embeddings with Adversarial Multi-task Learning for Age Group Classification
Kwangje Baeg
Yeong-Gwan Kim
Youngsub Han
Byoung-Ki Jeon
24
0
0
22 Jan 2023
BayesSpeech: A Bayesian Transformer Network for Automatic Speech
  Recognition
BayesSpeech: A Bayesian Transformer Network for Automatic Speech Recognition
Will Rieger
BDL
UQCV
21
0
0
16 Jan 2023
Training one model to detect heart and lung sound events from single
  point auscultations
Training one model to detect heart and lung sound events from single point auscultations
Leander Melms
Robert R. Ilesan
Ulrich Köhler
O. Hildebrandt
R. Conradt
...
Jürgen R. Schaefer
Tobias Müller
J. Obergassel
Nadine Schlicker
M. Hirsch
26
2
0
15 Jan 2023
Automated speech- and text-based classification of neuropsychiatric
  conditions in a multidiagnostic setting
Automated speech- and text-based classification of neuropsychiatric conditions in a multidiagnostic setting
L. Hansen
R. Rocca
A. Simonsen
A. Parola
V. Bliksted
...
Dan Bang
Kristian Tylén
Ethan Weed
S. Ostergaard
Riccardo Fusaroli
51
3
0
13 Jan 2023
Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition
  Systems A case study for Modern Greek
Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems A case study for Modern Greek
Georgios Paraskevopoulos
Theodoros Kouzelis
Georgios Rouvalis
Athanasios Katsamanis
Vassilis Katsouros
Alexandros Potamianos
VLM
30
7
0
31 Dec 2022
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for
  Universal and Generalized Speech Enhancement
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Wei-Ning Hsu
Tal Remez
Bowen Shi
Jacob Donley
Yossi Adi
DiffM
27
12
0
21 Dec 2022
Randomized Quantization: A Generic Augmentation for Data Agnostic
  Self-supervised Learning
Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning
Huimin Wu
Chenyang Lei
Xiao Sun
Pengju Wang
Qifeng Chen
Kwang-Ting Cheng
Stephen Lin
Zhirong Wu
MQ
38
5
0
19 Dec 2022
SegAugment: Maximizing the Utility of Speech Translation Data with
  Segmentation-based Augmentations
SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations
Ioannis Tsiamas
José A. R. Fonollosa
Marta R. Costa-jussá
43
6
0
19 Dec 2022
WACO: Word-Aligned Contrastive Learning for Speech Translation
WACO: Word-Aligned Contrastive Learning for Speech Translation
Siqi Ouyang
Rong Ye
Lei Li
32
25
0
19 Dec 2022
Audiovisual Masked Autoencoders
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
39
43
0
09 Dec 2022
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist
  Models
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Jinze Bai
Rui Men
Han Yang
Xuancheng Ren
Kai Dang
...
Wenhang Ge
Jianxin Ma
Junyang Lin
Jingren Zhou
Chang Zhou
37
15
0
08 Dec 2022
LMEC: Learnable Multiplicative Absolute Position Embedding Based
  Conformer for Speech Recognition
LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition
Yuguang Yang
Yu Pan
Jingjing Yin
Heng Lu
32
3
0
05 Dec 2022
Improving End-to-end Speech Translation by Leveraging Auxiliary Speech
  and Text Data
Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data
Yuhao Zhang
Chen Xu
Bojie Hu
Chunliang Zhang
Tong Xiao
Jingbo Zhu
29
15
0
04 Dec 2022
SoftCorrect: Error Correction with Soft Detection for Automatic Speech
  Recognition
SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition
Yichong Leng
Xu Tan
Wenjie Liu
Kaitao Song
Rui Wang
Xiang-Yang Li
Tao Qin
Ed Lin
Tie-Yan Liu
34
15
0
02 Dec 2022
Previous
12345...131415
Next