ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXivPDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 741 papers shown
Title
A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech
  Recognition Baseline
A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline
Yerbolat Khassanov
Saida Mussakhojayeva
A. Mirzakhmetov
A. Adiyev
Mukhamet Nurpeiissov
H. A. Varol
22
30
0
22 Sep 2020
Cough Against COVID: Evidence of COVID-19 Signature in Cough Sounds
Cough Against COVID: Evidence of COVID-19 Signature in Cough Sounds
Piyush Bagad
Aman Dalmia
Jigar Doshi
Arsha Nagrani
Parag Bhamare
A. Mahale
S. Rane
N. Agarwal
R. Panicker
34
112
0
17 Sep 2020
On Multitask Loss Function for Audio Event Detection and Localization
On Multitask Loss Function for Audio Event Detection and Localization
Huy P Phan
L. D. Pham
P. Koch
Ngoc Q. K. Duong
Ian Mcloughlin
Alfred Mertins
21
14
0
11 Sep 2020
On Target Segmentation for Direct Speech Translation
On Target Segmentation for Direct Speech Translation
Mattia Antonino Di Gangi
Marco Gaido
Matteo Negri
Marco Turchi
37
14
0
10 Sep 2020
VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device
  Speech Recognition
VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition
Quan Wang
Ignacio López Moreno
Mert Saglam
K. Wilson
Alan Chiao
...
Yanzhang He
Wei Li
Jason W. Pelecanos
M. Nika
A. Gruenstein
VLM
39
82
0
09 Sep 2020
Overview and Evaluation of Sound Event Localization and Detection in
  DCASE 2019
Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019
A. Politis
A. Mesaros
Sharath Adavanne
Toni Heittola
Tuomas Virtanen
19
126
0
06 Sep 2020
CRNNs for Urban Sound Tagging with spatiotemporal context
CRNNs for Urban Sound Tagging with spatiotemporal context
Augustin Arnault
Nicolas Riche
25
7
0
24 Aug 2020
Speech To Semantics: Improve ASR and NLU Jointly via All-Neural
  Interfaces
Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces
Milind Rao
A. Raju
Pranav Dheram
Bach Bui
Ariya Rastrow
21
43
0
14 Aug 2020
Distilling the Knowledge of BERT for Sequence-to-Sequence ASR
Distilling the Knowledge of BERT for Sequence-to-Sequence ASR
Hayato Futami
Hirofumi Inaguma
Sei Ueno
Masato Mimura
S. Sakai
Tatsuya Kawahara
24
50
0
09 Aug 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Jin Xu
Xu Tan
Yi Ren
Tao Qin
Jian Li
Sheng Zhao
Tie-Yan Liu
VLM
18
90
0
09 Aug 2020
Contextualized Translation of Automatically Segmented Speech
Contextualized Translation of Automatically Segmented Speech
Marco Gaido
Mattia Antonino Di Gangi
Matteo Negri
Mauro Cettolo
Marco Turchi
25
18
0
05 Aug 2020
Land Cover Classification from Remote Sensing Images Based on
  Multi-Scale Fully Convolutional Network
Land Cover Classification from Remote Sensing Images Based on Multi-Scale Fully Convolutional Network
Rui Li
Shunyi Zheng
Chenxi Duan
Ce Zhang
26
98
0
01 Aug 2020
Semi-Supervised Learning with Data Augmentation for End-to-End ASR
Semi-Supervised Learning with Data Augmentation for End-to-End ASR
F. Weninger
F. Mana
R. Gemello
Jesús Andrés-Ferrer
P. Zhan
25
30
0
27 Jul 2020
Efficient minimum word error rate training of RNN-Transducer for
  end-to-end speech recognition
Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition
Jinxi Guo
Gautam Tiwari
J. Droppo
Maarten Van Segbroeck
Che-Wei Huang
A. Stolcke
Roland Maas
21
55
0
27 Jul 2020
CoVoST 2 and Massively Multilingual Speech-to-Text Translation
CoVoST 2 and Massively Multilingual Speech-to-Text Translation
Changhan Wang
Anne Wu
J. Pino
SLR
27
72
0
20 Jul 2020
Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype
  Mining and Language-Dependent Score Normalization
Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization
Jenthe Thienpondt
Brecht Desplanques
Kris Demuynck
17
24
0
15 Jul 2020
TERA: Self-Supervised Learning of Transformer Encoder Representation for
  Speech
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
Andy T. Liu
Shang-Wen Li
Hung-yi Lee
SSL
62
356
0
12 Jul 2020
Class LM and word mapping for contextual biasing in End-to-End ASR
Class LM and word mapping for contextual biasing in End-to-End ASR
Rongqing Huang
Ossama Abdel-Hamid
Xinwei Li
G. Evermann
31
47
0
10 Jul 2020
Data Augmenting Contrastive Learning of Speech Representations in the
  Time Domain
Data Augmenting Contrastive Learning of Speech Representations in the Time Domain
Eugene Kharitonov
M. Rivière
Gabriel Synnaeve
Lior Wolf
Pierre-Emmanuel Mazaré
Matthijs Douze
Emmanuel Dupoux
31
117
0
02 Jul 2020
Self-Supervised MultiModal Versatile Networks
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
40
372
0
29 Jun 2020
Streaming Transformer ASR with Blockwise Synchronous Beam Search
Streaming Transformer ASR with Blockwise Synchronous Beam Search
E. Tsunoo
Yosuke Kashiwagi
Shinji Watanabe
22
11
0
25 Jun 2020
Self-Supervised Representations Improve End-to-End Speech Translation
Self-Supervised Representations Improve End-to-End Speech Translation
Anne Wu
Changhan Wang
J. Pino
Jiatao Gu
SSL
25
40
0
22 Jun 2020
Sound Event Localization and Detection Using Activity-Coupled Cartesian
  DOA Vector and RD3net
Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net
Kazuki Shimada
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
16
19
0
22 Jun 2020
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of
  Gradients
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients
Chenfei Zhu
Yu Cheng
Zhe Gan
Furong Huang
Jingjing Liu
Tom Goldstein
ODL
35
2
0
21 Jun 2020
Boosting Active Learning for Speech Recognition with Noisy
  Pseudo-labeled Samples
Boosting Active Learning for Speech Recognition with Noisy Pseudo-labeled Samples
Jihwan Bang
Heesu Kim
Y. Yoo
Jung-Woo Ha
9
2
0
19 Jun 2020
Are you wearing a mask? Improving mask detection from speech using
  augmentation by cycle-consistent GANs
Are you wearing a mask? Improving mask detection from speech using augmentation by cycle-consistent GANs
Nicolae-Cuatualin Ristea
Radu Tudor Ionescu
CVBM
8
41
0
17 Jun 2020
The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6
  Challenge
The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge
Ashish Arora
Desh Raj
Aswin Shanmugam Subramanian
Ke Li
Bar Ben Yair
Matthew Maciejewski
Piotr Żelasko
Leibny Paola García-Perera
Shinji Watanabe
Sanjeev Khudanpur
39
9
0
14 Jun 2020
End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020
End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020
Marco Gaido
Mattia Antonino Di Gangi
Matteo Negri
Marco Turchi
19
53
0
04 Jun 2020
High-Fidelity Audio Generation and Representation Learning with Guided
  Adversarial Autoencoder
High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder
Kazi Nazmul Haque
R. Rana
Björn W Schuller
DRL
26
12
0
01 Jun 2020
CLOCS: Contrastive Learning of Cardiac Signals Across Space, Time, and
  Patients
CLOCS: Contrastive Learning of Cardiac Signals Across Space, Time, and Patients
Dani Kiyasseh
T. Zhu
David Clifton
33
186
0
27 May 2020
Multistream CNN for Robust Acoustic Modeling
Multistream CNN for Robust Acoustic Modeling
Kyu Jeong Han
Jing Pan
Venkata Krishna Naveen Tadala
T. Ma
Daniel Povey
19
34
0
21 May 2020
Simplified Self-Attention for Transformer-based End-to-End Speech
  Recognition
Simplified Self-Attention for Transformer-based End-to-End Speech Recognition
Haoneng Luo
Shiliang Zhang
Ming Lei
Lei Xie
35
33
0
21 May 2020
A Comparison of Label-Synchronous and Frame-Synchronous End-to-End
  Models for Speech Recognition
A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Linhao Dong
Cheng Yi
Jianzong Wang
Shiyu Zhou
Shuang Xu
X. Jia
Bo Xu
36
17
0
20 May 2020
Early Stage LM Integration Using Local and Global Log-Linear Combination
Early Stage LM Integration Using Local and Global Log-Linear Combination
Wilfried Michel
Ralf Schluter
Hermann Ney
19
11
0
20 May 2020
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based
  Quantized DNNs
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs
Yongkweon Jeon
Baeseong Park
S. Kwon
Byeongwook Kim
Jeongin Yun
Dongsoo Lee
MQ
33
30
0
20 May 2020
Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict
Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict
Yosuke Higuchi
Shinji Watanabe
Nanxin Chen
Tetsuji Ogawa
Tetsunori Kobayashi
19
137
0
18 May 2020
Attention-based Transducer for Online Speech Recognition
Attention-based Transducer for Online Speech Recognition
Bin Wang
Yan Yin
Hui-Ching Lin
18
4
0
18 May 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
107
3,044
0
16 May 2020
Streaming Transformer-based Acoustic Models Using Self-attention with
  Augmented Memory
Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory
Chunyang Wu
Yongqiang Wang
Yangyang Shi
Ching-Feng Yeh
Frank Zhang
RALM
31
60
0
16 May 2020
AccentDB: A Database of Non-Native English Accents to Assist Neural
  Speech Recognition
AccentDB: A Database of Non-Native English Accents to Assist Neural Speech Recognition
Afroz Ahamad
Ankit Anand
Pranesh Bhargava
19
22
0
16 May 2020
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech
  Recognition
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition
Zhengkun Tian
Jiangyan Yi
J. Tao
Ye Bai
Shuai Zhang
Zhengqi Wen
16
54
0
16 May 2020
Large scale weakly and semi-supervised learning for low-resource video
  ASR
Large scale weakly and semi-supervised learning for low-resource video ASR
Kritika Singh
Vimal Manohar
Alex Xiao
Sergey Edunov
Ross B. Girshick
Vitaliy Liptchinsky
Christian Fuegen
Yatharth Saraf
Geoffrey Zweig
Abdel-rahman Mohamed
31
9
0
16 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by
  Text-To-Speech Data Augmentation
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
A. Laptev
Roman Korostik
A. Svischev
A. Andrusenko
Ivan Medennikov
S. Rybin
16
61
0
14 May 2020
Streaming keyword spotting on mobile devices
Streaming keyword spotting on mobile devices
Oleg Rybakov
Natasha Kononenko
Niranjan A. Subrahmanya
Mirkó Visontai
Stella Laurenzo
AI4TS
19
109
0
14 May 2020
ContextNet: Improving Convolutional Neural Networks for Automatic Speech
  Recognition with Global Context
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
Wei Han
Zhengdong Zhang
Yu Zhang
Jiahui Yu
Chung-Cheng Chiu
James Qin
Anmol Gulati
Ruoming Pang
Yonghui Wu
42
259
0
07 May 2020
MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech
MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech
Jakob Drachmann Havtorn
Jan Latko
Joakim Edin
Lasse Borgholt
Lars Maaløe
Lorenzo Belgrano
Nicolai Frost Jakobsen
R. Sdun
Zeljko Agic
19
3
0
02 May 2020
Logic-Guided Data Augmentation and Regularization for Consistent
  Question Answering
Logic-Guided Data Augmentation and Regularization for Consistent Question Answering
Akari Asai
Hannaneh Hajishirzi
NAI
16
111
0
21 Apr 2020
Curriculum Pre-training for End-to-End Speech Translation
Curriculum Pre-training for End-to-End Speech Translation
Chengyi Wang
Yu Wu
Shujie Liu
Ming Zhou
Zhenglu Yang
21
108
0
21 Apr 2020
Serialized Output Training for End-to-End Overlapped Speech Recognition
Serialized Output Training for End-to-End Overlapped Speech Recognition
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
19
113
0
28 Mar 2020
Stochastic Frequency Masking to Improve Super-Resolution and Denoising
  Networks
Stochastic Frequency Masking to Improve Super-Resolution and Denoising Networks
Majed El Helou
Ruofan Zhou
Sabine Süsstrunk
24
45
0
16 Mar 2020
Previous
123...131415
Next