ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXiv (abs)PDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 1,048 papers shown
Title
Confidence Estimation for Attention-based Sequence-to-sequence Models
  for Speech Recognition
Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition
Qiujia Li
David Qiu
Yu Zhang
Yue Liu
Yanzhang He
P. Woodland
Liangliang Cao
Trevor Strohman
50
49
0
22 Oct 2020
A General Multi-Task Learning Framework to Leverage Text Data for Speech
  to Text Tasks
A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks
Yun Tang
J. Pino
Changhan Wang
Xutai Ma
Dmitriy Genzel
81
75
0
21 Oct 2020
The IDLAB VoxSRC-20 Submission: Large Margin Fine-Tuning and
  Quality-Aware Score Calibration in DNN Based Speaker Verification
The IDLAB VoxSRC-20 Submission: Large Margin Fine-Tuning and Quality-Aware Score Calibration in DNN Based Speaker Verification
Jenthe Thienpondt
Brecht Desplanques
Kris Demuynck
82
84
0
21 Oct 2020
How Data Augmentation affects Optimization for Linear Regression
How Data Augmentation affects Optimization for Linear Regression
Boris Hanin
Yi Sun
86
16
0
21 Oct 2020
FastEmit: Low-latency Streaming ASR with Sequence-level Emission
  Regularization
FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization
Jiahui Yu
Chung-Cheng Chiu
Yue Liu
Shuo-yiin Chang
Tara N. Sainath
...
A. Narayanan
Wei Han
Anmol Gulati
Yonghui Wu
Ruoming Pang
86
92
0
21 Oct 2020
Emformer: Efficient Memory Transformer Based Acoustic Model For Low
  Latency Streaming Speech Recognition
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition
Yangyang Shi
Yongqiang Wang
Chunyang Wu
Ching-Feng Yeh
Julian Chan
Frank Zhang
Duc Le
M. Seltzer
200
172
0
21 Oct 2020
Towards End-to-End Training of Automatic Speech Recognition for Nigerian Pidgin
Towards End-to-End Training of Automatic Speech Recognition for Nigerian Pidgin
Amina Mardiyyah Rufai
Afolabi Abeeb
Esther Oduntan
Tayo Arulogun
Oluwabukola Adegboro
Daniel Ajisafe
112
4
0
21 Oct 2020
Pushing the Limits of Semi-Supervised Learning for Automatic Speech
  Recognition
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
Yu Zhang
James Qin
Daniel S. Park
Wei Han
Chung-Cheng Chiu
Ruoming Pang
Quoc V. Le
Yonghui Wu
VLMSSL
229
310
0
20 Oct 2020
MicAugment: One-shot Microphone Style Transfer
MicAugment: One-shot Microphone Style Transfer
Zalan Borsos
Yunpeng Li
Beat Gfeller
Marco Tagliasacchi
43
4
0
19 Oct 2020
CLAR: Contrastive Learning of Auditory Representations
CLAR: Contrastive Learning of Auditory Representations
Haider Al-Tahan
Y. Mohsenzadeh
SSL
195
56
0
19 Oct 2020
Lightweight End-to-End Speech Recognition from Raw Audio Data Using
  Sinc-Convolutions
Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions
Ludwig Kurzinger
Nicolas Lindae
Palle Klewitz
Gerhard Rigoll
67
5
0
15 Oct 2020
Viewmaker Networks: Learning Views for Unsupervised Representation
  Learning
Viewmaker Networks: Learning Views for Unsupervised Representation Learning
Alex Tamkin
Mike Wu
Noah D. Goodman
SSL
131
64
0
14 Oct 2020
Exploiting Spectral Augmentation for Code-Switched Spoken Language
  Identification
Exploiting Spectral Augmentation for Code-Switched Spoken Language Identification
P. Rangan
Sundeep Teki
Hemant Misra
36
22
0
14 Oct 2020
Towards Data-efficient Modeling for Wake Word Spotting
Towards Data-efficient Modeling for Wake Word Spotting
Yixin Gao
Yuriy Mishchenko
Anish Shah
Spyros Matsoukas
S. Vitaladevuni
101
32
0
13 Oct 2020
Improving Low Resource Code-switched ASR using Augmented Code-switched
  TTS
Improving Low Resource Code-switched ASR using Augmented Code-switched TTS
Yash Sharma
Basil Abraham
Karan Taneja
Preethi Jyothi
61
21
0
12 Oct 2020
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
Changhan Wang
Yun Tang
Xutai Ma
Anne Wu
Sravya Popuri
Dmytro Okhonko
J. Pino
VLMLRM
116
276
0
11 Oct 2020
Contrastive Representation Learning: A Framework and Review
Contrastive Representation Learning: A Framework and Review
Phúc H. Lê Khắc
Graham Healy
Alan F. Smeaton
SSLAI4TS
330
722
0
10 Oct 2020
Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent
  Systems
Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems
Yinghui Huang
H. Kuo
Samuel Thomas
Zvi Kons
Kartik Audhkhasi
Brian Kingsbury
R. Hoory
M. Picheny
VLM
48
63
0
08 Oct 2020
Population Based Training for Data Augmentation and Regularization in
  Speech Recognition
Population Based Training for Data Augmentation and Regularization in Speech Recognition
Daniel Haziza
Jérémy Rapin
Gabriel Synnaeve
35
1
0
08 Oct 2020
Super-Human Performance in Online Low-latency Recognition of
  Conversational Speech
Super-Human Performance in Online Low-latency Recognition of Conversational Speech
T. Nguyen
S. Stueker
A. Waibel
BDL
74
38
0
07 Oct 2020
Transformer Transducer: One Model Unifying Streaming and Non-streaming
  Speech Recognition
Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition
Anshuman Tripathi
Jaeyoung Kim
Qian Zhang
Han Lu
Hasim Sak
71
43
0
07 Oct 2020
Representation Learning for Sequence Data with Deep Autoencoding
  Predictive Components
Representation Learning for Sequence Data with Deep Autoencoding Predictive Components
Junwen Bai
Weiran Wang
Yingbo Zhou
Caiming Xiong
SSLAI4TS
75
12
0
07 Oct 2020
SPLAT: Speech-Language Joint Pre-Training for Spoken Language
  Understanding
SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding
Yu-An Chung
Chenguang Zhu
Michael Zeng
VLM
72
8
0
05 Oct 2020
Differentiable Weighted Finite-State Transducers
Differentiable Weighted Finite-State Transducers
Awni Y. Hannun
Vineel Pratap
Jacob Kahn
Wei-Ning Hsu
118
29
0
02 Oct 2020
Embedded Emotions -- A Data Driven Approach to Learn Transferable
  Feature Representations from Raw Speech Input for Emotion Recognition
Embedded Emotions -- A Data Driven Approach to Learn Transferable Feature Representations from Raw Speech Input for Emotion Recognition
Dominik Schiller
Silvan Mertes
Elisabeth André
15
0
0
30 Sep 2020
A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech
  Recognition Baseline
A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline
Yerbolat Khassanov
Saida Mussakhojayeva
A. Mirzakhmetov
A. Adiyev
Mukhamet Nurpeiissov
H. A. Varol
57
31
0
22 Sep 2020
Consecutive Decoding for Speech-to-text Translation
Consecutive Decoding for Speech-to-text Translation
Qianqian Dong
Mingxuan Wang
Hao Zhou
Shuang Xu
Bo Xu
Lei Li
SLR
111
41
0
21 Sep 2020
"Listen, Understand and Translate": Triple Supervision Decouples
  End-to-end Speech-to-text Translation
"Listen, Understand and Translate": Triple Supervision Decouples End-to-end Speech-to-text Translation
Qianqian Dong
Rong Ye
Mingxuan Wang
Hao Zhou
Shuang Xu
Bo Xu
Lei Li
75
3
0
21 Sep 2020
Cough Against COVID: Evidence of COVID-19 Signature in Cough Sounds
Cough Against COVID: Evidence of COVID-19 Signature in Cough Sounds
Piyush Bagad
Aman Dalmia
Jigar Doshi
Arsha Nagrani
Parag Bhamare
A. Mahale
S. Rane
N. Agarwal
R. Panicker
105
113
0
17 Sep 2020
EasyASR: A Distributed Machine Learning Platform for End-to-end
  Automatic Speech Recognition
EasyASR: A Distributed Machine Learning Platform for End-to-end Automatic Speech Recognition
Chengyu Wang
Mengli Cheng
Xu Hu
Jun Huang
VLM
42
6
0
14 Sep 2020
On Multitask Loss Function for Audio Event Detection and Localization
On Multitask Loss Function for Audio Event Detection and Localization
Huy P Phan
L. D. Pham
P. Koch
Ngoc Q. K. Duong
Ian Mcloughlin
Alfred Mertins
87
14
0
11 Sep 2020
On Target Segmentation for Direct Speech Translation
On Target Segmentation for Direct Speech Translation
Mattia Antonino Di Gangi
Marco Gaido
Matteo Negri
Marco Turchi
79
14
0
10 Sep 2020
VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device
  Speech Recognition
VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition
Quan Wang
Ignacio López Moreno
Mert Saglam
K. Wilson
Alan Chiao
...
Yanzhang He
Wei Li
Jason W. Pelecanos
M. Nika
A. Gruenstein
VLM
71
86
0
09 Sep 2020
AutoKWS: Keyword Spotting with Differentiable Architecture Search
AutoKWS: Keyword Spotting with Differentiable Architecture Search
Bo Zhang
WenFeng Li
Qingyuan Li
Weiji Zhuang
Xiangxiang Chu
Yujun Wang
68
23
0
08 Sep 2020
KoSpeech: Open-Source Toolkit for End-to-End Korean Speech Recognition
KoSpeech: Open-Source Toolkit for End-to-End Korean Speech Recognition
Soohwan Kim
Seyoung Bae
Cheolhwang Won
VLM
31
5
0
07 Sep 2020
Overview and Evaluation of Sound Event Localization and Detection in
  DCASE 2019
Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019
Archontis Politis
A. Mesaros
Sharath Adavanne
Toni Heittola
Tuomas Virtanen
133
128
0
06 Sep 2020
Multi-Attention-Network for Semantic Segmentation of Fine Resolution
  Remote Sensing Images
Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images
Rui Li
Shunyi Zheng
Chenxi Duan
Ce Zhang
Jianlin Su
P. M. Atkinson
SSeg
108
388
0
03 Sep 2020
Convolutional Speech Recognition with Pitch and Voice Quality Features
Convolutional Speech Recognition with Pitch and Voice Quality Features
Guillermo Cámbara
Jordi Luque
Mireia Farrús
28
8
0
02 Sep 2020
Parallel Rescoring with Transformer for Streaming On-Device Speech
  Recognition
Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition
Wei Li
James Qin
Chung-Cheng Chiu
Ruoming Pang
Yanzhang He
85
14
0
30 Aug 2020
A Survey of Deep Active Learning
A Survey of Deep Active Learning
Pengzhen Ren
Yun Xiao
Xiaojun Chang
Po-Yao (Bernie) Huang
Zhihui Li
Brij B. Gupta
Xiaojiang Chen
Xin Wang
160
1,161
0
30 Aug 2020
Data augmentation using prosody and false starts to recognize non-native
  children's speech
Data augmentation using prosody and false starts to recognize non-native children's speech
H. Kathania
Mittul Singh
Tamás Grósz
M. Kurimo
19
12
0
29 Aug 2020
CRNNs for Urban Sound Tagging with spatiotemporal context
CRNNs for Urban Sound Tagging with spatiotemporal context
Augustin Arnault
Nicolas Riche
58
7
0
24 Aug 2020
Howl: A Deployed, Open-Source Wake Word Detection System
Howl: A Deployed, Open-Source Wake Word Detection System
Raphael Tang
Jaejun Lee
Afsaneh Razi
Julia Cambre
Ian Bicking
Jofish Kaye
Jimmy J. Lin
VLM
52
17
0
21 Aug 2020
Speech To Semantics: Improve ASR and NLU Jointly via All-Neural
  Interfaces
Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces
Milind Rao
A. Raju
Pranav Dheram
Bach Bui
Ariya Rastrow
58
43
0
14 Aug 2020
Subword Regularization: An Analysis of Scalability and Generalization
  for End-to-End Automatic Speech Recognition
Subword Regularization: An Analysis of Scalability and Generalization for End-to-End Automatic Speech Recognition
Egor Lakomkin
Jahn Heymann
Ilya Sklyar
Simon Wiesler
51
8
0
10 Aug 2020
Distilling the Knowledge of BERT for Sequence-to-Sequence ASR
Distilling the Knowledge of BERT for Sequence-to-Sequence ASR
Hayato Futami
Hirofumi Inaguma
Sei Ueno
Masato Mimura
S. Sakai
Tatsuya Kawahara
78
53
0
09 Aug 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Jin Xu
Xu Tan
Yi Ren
Tao Qin
Jian Li
Sheng Zhao
Tie-Yan Liu
VLM
70
91
0
09 Aug 2020
Investigation of Speaker-adaptation methods in Transformer based ASR
Investigation of Speaker-adaptation methods in Transformer based ASR
Vishwas M. Shetty
J. MetildaSagayaMaryN.
S. Umesh
87
5
0
07 Aug 2020
Contextualized Translation of Automatically Segmented Speech
Contextualized Translation of Automatically Segmented Speech
Marco Gaido
Mattia Antonino Di Gangi
Matteo Negri
Mauro Cettolo
Marco Turchi
61
19
0
05 Aug 2020
Land Cover Classification from Remote Sensing Images Based on
  Multi-Scale Fully Convolutional Network
Land Cover Classification from Remote Sensing Images Based on Multi-Scale Fully Convolutional Network
Rui Li
Shunyi Zheng
Chenxi Duan
Ce Zhang
104
102
0
01 Aug 2020
Previous
123...1718192021
Next