ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXivPDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 751 papers shown
Title
Knowledge Transfer and Distillation from Autoregressive to
  Non-Autoregressive Speech Recognition
Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition
Xun Gong
Zhikai Zhou
Y. Qian
20
3
0
15 Jul 2022
Data Augmentation for Low-Resource Quechua ASR Improvement
Data Augmentation for Low-Resource Quechua ASR Improvement
Rodolfo Zevallos
Núria Bel
Guillermo Cámbara
Mireia Farrús
Jordi Luque
VLM
SyDa
19
6
0
14 Jul 2022
Two-Pass Low Latency End-to-End Spoken Language Understanding
Two-Pass Low Latency End-to-End Spoken Language Understanding
Siddhant Arora
Siddharth Dalmia
Xuankai Chang
Brian Yan
A. Black
Shinji Watanabe
VLM
32
19
0
14 Jul 2022
Masked Autoencoders that Listen
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
28
270
0
13 Jul 2022
Visual Context-driven Audio Feature Enhancement for Robust End-to-End
  Audio-Visual Speech Recognition
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Joanna Hong
Minsu Kim
Daehun Yoo
Y. Ro
26
21
0
13 Jul 2022
Multitask Learning from Augmented Auxiliary Data for Improving Speech
  Emotion Recognition
Multitask Learning from Augmented Auxiliary Data for Improving Speech Emotion Recognition
S. Latif
R. Rana
Sara Khalifa
Raja Jurdak
Björn W. Schuller
37
22
0
12 Jul 2022
Speaker consistency loss and step-wise optimization for semi-supervised
  joint training of TTS and ASR using unpaired text data
Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
Naoki Makishima
Satoshi Suzuki
Atsushi Ando
Ryo Masumura
146
4
0
11 Jul 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and
  Global Context for Speech Recognition and Understanding
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
32
143
0
06 Jul 2022
Improving Transformer-based Conversational ASR by Inter-Sentential
  Attention Mechanism
Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism
Kun Wei
Pengcheng Guo
Ning Jiang
61
11
0
02 Jul 2022
Tree-constrained Pointer Generator with Graph Neural Network Encodings
  for Contextual Speech Recognition
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition
Guangzhi Sun
Chuxu Zhang
P. Woodland
27
12
0
02 Jul 2022
UserLibri: A Dataset for ASR Personalization Using Only Text
UserLibri: A Dataset for ASR Personalization Using Only Text
Theresa Breiner
Swaroop Indra Ramaswamy
Ehsan Variani
Shefali Garg
Rajiv Mathews
K. Sim
Kilol Gupta
Mingqing Chen
Lara McConnaughey
35
16
0
02 Jul 2022
Improving Deliberation by Text-Only and Semi-Supervised Training
Improving Deliberation by Text-Only and Semi-Supervised Training
Ke Hu
Tara N. Sainath
Yanzhang He
Rohit Prabhavalkar
Trevor Strohman
S. Mavandadi
Weiran Wang
41
12
0
29 Jun 2022
Contextual Density Ratio for Language Model Biasing of Sequence to
  Sequence ASR Systems
Contextual Density Ratio for Language Model Biasing of Sequence to Sequence ASR Systems
Jesús Andrés-Ferrer
Dario Albesano
P. Zhan
Paul Vozila
18
6
0
29 Jun 2022
Data augmentation for learning predictive models on EEG: a systematic
  comparison
Data augmentation for learning predictive models on EEG: a systematic comparison
Cédric Rommel
Joseph Paillard
Thomas Moreau
Alexandre Gramfort
66
54
0
29 Jun 2022
STOP: A dataset for Spoken Task Oriented Semantic Parsing
STOP: A dataset for Spoken Task Oriented Semantic Parsing
Paden Tomasello
Akshat Shrivastava
Daniel Lazar
Po-Chun Hsu
Duc Le
...
Robin Algayres
Tu Nguyen
Emmanuel Dupoux
Luke Zettlemoyer
Abdel-rahman Mohamed
30
35
0
29 Jun 2022
QTI Submission to DCASE 2021: residual normalization for
  device-imbalanced acoustic scene classification with efficient design
QTI Submission to DCASE 2021: residual normalization for device-imbalanced acoustic scene classification with efficient design
Byeonggeun Kim
Seunghan Yang
Jangho Kim
Simyung Chang
32
57
0
28 Jun 2022
Data Augmentation for Dementia Detection in Spoken Language
Data Augmentation for Dementia Detection in Spoken Language
Anna Hlédiková
Dominika Woszczyk
Alican Acman
Soteris Demetriou
Björn Schuller
36
12
0
26 Jun 2022
On Comparison of Encoders for Attention based End to End Speech
  Recognition in Standalone and Rescoring Mode
On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring Mode
Raviraj Joshi
Subodh Kumar
38
2
0
26 Jun 2022
Domain Generalization with Relaxed Instance Frequency-wise Normalization
  for Multi-device Acoustic Scene Classification
Domain Generalization with Relaxed Instance Frequency-wise Normalization for Multi-device Acoustic Scene Classification
Byeonggeun Kim
Seunghan Yang
Jangho Kim
Hyunsin Park
Juntae Lee
Simyung Chang
45
28
0
24 Jun 2022
Confidence Score Based Conformer Speaker Adaptation for Speech
  Recognition
Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
Jiajun Deng
Xurong Xie
Tianzi Wang
Mingyu Cui
Boyang Xue
Zengrui Jin
Mengzhe Geng
Guinan Li
Xunying Liu
Helen M. Meng
22
13
0
24 Jun 2022
Pruned RNN-T for fast, memory-efficient ASR training
Pruned RNN-T for fast, memory-efficient ASR training
Fangjun Kuang
Liyong Guo
Wei Kang
Long Lin
Mingshuang Luo
Zengwei Yao
Daniel Povey
31
64
0
23 Jun 2022
UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at
  ActivityNet Challenge 2022
UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022
Yuanhang Zhang
Susan Liang
Shuang Yang
Shiguang Shan
15
4
0
22 Jun 2022
Boosting Cross-Domain Speech Recognition with Self-Supervision
Boosting Cross-Domain Speech Recognition with Self-Supervision
Hanjing Zhu
Gaofeng Cheng
Jindong Wang
Wenxin Hou
Pengyuan Zhang
Yonghong Yan
19
13
0
20 Jun 2022
Redundancy Reduction Twins Network: A Training framework for
  Multi-output Emotion Regression
Redundancy Reduction Twins Network: A Training framework for Multi-output Emotion Regression
Xin Jing
Meishu Song
Andreas Triantafyllopoulos
Zijiang Yang
Björn W. Schuller
21
8
0
18 Jun 2022
Event-related data conditioning for acoustic event classification
Event-related data conditioning for acoustic event classification
Yuanbo Hou
Dick Botteldooren
33
3
0
16 Jun 2022
Censer: Curriculum Semi-supervised Learning for Speech Recognition Based
  on Self-supervised Pre-training
Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training
Bowen Zhang
Songjun Cao
Xiaoming Zhang
Yike Zhang
Long Ma
T. Shinozaki
SSL
30
4
0
16 Jun 2022
Investigating Multi-Feature Selection and Ensembling for Audio
  Classification
Investigating Multi-Feature Selection and Ensembling for Audio Classification
Muhammad Turab
Teerath Kumar
Malika Bendechache
Takfarinas Saber
46
41
0
15 Jun 2022
COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for
  Uncertainty-Aware Multimodal Emotion Recognition
COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion Recognition
M. Tellamekala
Shahin Amiriparian
Björn W. Schuller
Elisabeth André
T. Giesbrecht
M. Valstar
33
25
0
12 Jun 2022
The Influence of Dataset Partitioning on Dysfluency Detection Systems
The Influence of Dataset Partitioning on Dysfluency Detection Systems
Sebastian P. Bayerl
Dominik Wagner
Elmar Nöth
Tobias Bocklet
Korbinian Riedhammer
44
20
0
07 Jun 2022
LegoNN: Building Modular Encoder-Decoder Models
LegoNN: Building Modular Encoder-Decoder Models
Siddharth Dalmia
Dmytro Okhonko
M. Lewis
Sergey Edunov
Shinji Watanabe
Florian Metze
Luke Zettlemoyer
Abdel-rahman Mohamed
AuLLM
MoE
29
14
0
07 Jun 2022
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
Jinchuan Tian
Jianwei Yu
Chunlei Zhang
Chao Weng
Yuexian Zou
Dong Yu
AuLLM
22
25
0
05 Jun 2022
Automated Audio Captioning with Epochal Difficult Captions for
  Curriculum Learning
Automated Audio Captioning with Epochal Difficult Captions for Curriculum Learning
Andrew Koh
Soham Dinesh Tiwari
Chng Eng Siong
25
1
0
04 Jun 2022
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Sehoon Kim
A. Gholami
Albert Eaton Shaw
Nicholas Lee
K. Mangalam
Jitendra Malik
Michael W. Mahoney
Kurt Keutzer
47
99
0
02 Jun 2022
Do self-supervised speech models develop human-like perception biases?
Do self-supervised speech models develop human-like perception biases?
Juliette Millet
Ewan Dunbar
SSL
24
20
0
31 May 2022
Contrastive Siamese Network for Semi-supervised Speech Recognition
Contrastive Siamese Network for Semi-supervised Speech Recognition
S. Khorram
Jaeyoung Kim
Anshuman Tripathi
Han Lu
Qian Zhang
Hasim Sak
SSL
36
11
0
27 May 2022
Joint Training of Speech Enhancement and Self-supervised Model for
  Noise-robust ASR
Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR
Qiu-shi Zhu
Jie Zhang
Zitian Zhang
Lirong Dai
43
15
0
26 May 2022
DT-SV: A Transformer-based Time-domain Approach for Speaker Verification
DT-SV: A Transformer-based Time-domain Approach for Speaker Verification
Nan Zhang
Jianzong Wang
Zhenhou Hong
Chendong Zhao
Xiaoyang Qu
Jing Xiao
44
5
0
26 May 2022
Improving CTC-based ASR Models with Gated Interlayer Collaboration
Improving CTC-based ASR Models with Gated Interlayer Collaboration
Yuting Yang
Yuke Li
Binbin Du
42
11
0
25 May 2022
Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition
Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition
Yuting Yang
Binbin Du
Yuke Li
31
1
0
24 May 2022
Adaptive Few-Shot Learning Algorithm for Rare Sound Event Detection
Adaptive Few-Shot Learning Algorithm for Rare Sound Event Detection
Chendong Zhao
Jianzong Wang
LeiLai Li
Xiaoyang Qu
Jing Xiao
65
10
0
24 May 2022
Non-Parametric Domain Adaptation for End-to-End Speech Translation
Non-Parametric Domain Adaptation for End-to-End Speech Translation
Yichao Du
Weizhi Wang
Zhirui Zhang
Boxing Chen
Tong Xu
Jun Xie
Enhong Chen
53
18
0
23 May 2022
Minimising Biasing Word Errors for Contextual ASR with the
  Tree-Constrained Pointer Generator
Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator
Guangzhi Sun
Chuxu Zhang
P. Woodland
39
14
0
18 May 2022
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual
  Speech Representation
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
Sameer Khurana
Antoine Laurent
James R. Glass
30
36
0
17 May 2022
Pretraining Approaches for Spoken Language Recognition: TalTech
  Submission to the OLR 2021 Challenge
Pretraining Approaches for Spoken Language Recognition: TalTech Submission to the OLR 2021 Challenge
Tanel Alumäe
Kunnar Kukk
15
5
0
14 May 2022
Who Are We Talking About? Handling Person Names in Speech Translation
Who Are We Talking About? Handling Person Names in Speech Translation
Marco Gaido
Matteo Negri
Marco Turchi
23
7
0
13 May 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges
  in Audio Captioning
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
44
13
0
11 May 2022
A Conformer-based Waveform-domain Neural Acoustic Echo Canceller
  Optimized for ASR Accuracy
A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR Accuracy
S. Panchapagesan
A. Narayanan
T. Shabestary
Shuai Shao
N. Howard
Alex Park
James Walker
A. Gruenstein
34
3
0
06 May 2022
On monoaural speech enhancement for automatic recognition of real noisy
  speech using mixture invariant training
On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training
Jisi Zhang
Catalin Zorila
R. Doddipatla
Jon Barker
30
4
0
03 May 2022
Efficient dynamic filter for robust and low computational feature
  extraction
Efficient dynamic filter for robust and low computational feature extraction
Donghyeon Kim
Gwantae Kim
Bokyeung Lee
Jeong-gi Kwak
D. Han
Hanseok Ko
39
3
0
03 May 2022
Improving Multimodal Speech Recognition by Data Augmentation and Speech
  Representations
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations
Dan Oneaţă
H. Cucu
19
19
0
27 Apr 2022
Previous
123...678...141516
Next