Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.08779
Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"
50 / 736 papers shown
Title
Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning
Wei Xia
Chunlei Zhang
Chao Weng
Meng Yu
Dong Yu
SSL
25
78
0
13 Dec 2020
Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging
Rohit Prabhavalkar
Yanzhang He
David Rybach
S. Campbell
A. Narayanan
Trevor Strohman
Tara N. Sainath
52
35
0
12 Dec 2020
Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion
Vijay Ravi
Yile Gu
Ankur Gandhe
Ariya Rastrow
Linda Liu
Denis Filimonov
Scott Novotney
I. Bulyko
27
9
0
30 Nov 2020
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training
Sameer Khurana
Niko Moritz
Takaaki Hori
Jonathan Le Roux
24
54
0
26 Nov 2020
Deep Discriminative Feature Learning for Accent Recognition
Wei Wang
Chao Zhang
Xiao-pei Wu
34
2
0
25 Nov 2020
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Yiling Huang
Yutian Chen
Jason W. Pelecanos
Quan Wang
33
11
0
24 Nov 2020
Improving RNN-T ASR Accuracy Using Context Audio
A. Schwarz
Ilya Sklyar
Simon Wiesler
24
9
0
20 Nov 2020
Predicting Rigid Body Dynamics using Dual Quaternion Recurrent Neural Networks with Quaternion Attention
Johannes Pöppelbaum
Andreas Schwung
16
13
0
17 Nov 2020
Unsupervised Contrastive Learning of Sound Event Representations
Eduardo Fonseca
Diego Ortego
Kevin McGuinness
Noel E. O'Connor
Xavier Serra
SSL
27
65
0
15 Nov 2020
Low-resource expressive text-to-speech using data augmentation
Goeric Huybrechts
Thomas Merritt
Giulia Comini
Bartek Perz
Raahil Shah
Jaime Lorenzo-Trueba
26
50
0
11 Nov 2020
Towards Semi-Supervised Semantics Understanding from Speech
Cheng-I Jeff Lai
Jin Cao
S. Bodapati
Shang-Wen Li
SSL
22
7
0
11 Nov 2020
Data Augmentation For Children's Speech Recognition -- The "Ethiopian" System For The SLT 2021 Children Speech Recognition Challenge
Guoguo Chen
Xingyu Na
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Sifan Ma
Yujun Wang
35
19
0
09 Nov 2020
Dual Application of Speech Enhancement for Automatic Speech Recognition
Ashutosh Pandey
Chunxi Liu
Yun Wang
Yatharth Saraf
43
37
0
07 Nov 2020
Improving RNN Transducer Based ASR with Auxiliary Tasks
Chunxi Liu
Frank Zhang
Duc Le
Suyoun Kim
Yatharth Saraf
Geoffrey Zweig
28
49
0
05 Nov 2020
Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation
Hang Le
J. Pino
Changhan Wang
Jiatao Gu
D. Schwab
Laurent Besacier
39
82
0
02 Nov 2020
ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection
Kazuki Shimada
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
23
86
0
29 Oct 2020
Cascaded encoders for unifying streaming and non-streaming ASR
A. Narayanan
Tara N. Sainath
Ruoming Pang
Jiahui Yu
Chung-Cheng Chiu
Rohit Prabhavalkar
Ehsan Variani
Trevor Strohman
AuLLM
8
85
0
27 Oct 2020
Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning
Dongwei Jiang
Wubo Li
Miao Cao
Wei Zou
Xiangang Li
SSL
27
65
0
27 Oct 2020
Recent Developments on ESPnet Toolkit Boosted by Conformer
Pengcheng Guo
Florian Boyer
Xuankai Chang
Tomoki Hayashi
Yosuke Higuchi
...
Jing Shi
Shinji Watanabe
Kun Wei
Wangyou Zhang
Yuekai Zhang
45
262
0
26 Oct 2020
MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection
Fei Jia
Somshubra Majumdar
Boris Ginsburg
19
48
0
26 Oct 2020
Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer
Suyoun Kim
Shangguan Yuan
Jay Mahadeokar
A. Bruguier
Christian Fuegen
M. Seltzer
Duc Le
23
28
0
26 Oct 2020
Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining
Cheng-I Jeff Lai
Yung-Sung Chuang
Hung-yi Lee
Shang-Wen Li
James R. Glass
VLM
SSL
27
58
0
26 Oct 2020
Two-stage Textual Knowledge Distillation for End-to-End Spoken Language Understanding
Seongbin Kim
Gyuwan Kim
Seongjin Shin
Sangmin Lee
VLM
18
19
0
25 Oct 2020
An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection
Yin Cao
Turab Iqbal
Qiuqiang Kong
Y. Zhong
Wenwu Wang
Mark D. Plumbley
16
75
0
25 Oct 2020
Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment
Ethan A. Chi
Julian Salazar
Katrin Kirchhoff
AI4TS
25
51
0
24 Oct 2020
Transformer-based End-to-End Speech Recognition with Local Dense Synthesizer Attention
Menglong Xu
Shengqiang Li
Xiao-Lei Zhang
27
31
0
23 Oct 2020
Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning
Sungkyun Chang
Donmoon Lee
Jeongsoon Park
Hyungui Lim
Kyogu Lee
Karam Ko
Yoonchang Han
25
34
0
22 Oct 2020
Urban Sound Classification : striving towards a fair comparison
Augustin Arnault
Baptiste Hanssens
Nicolas Riche
24
8
0
22 Oct 2020
Rethinking Evaluation in ASR: Are Our Models Robust Enough?
Tatiana Likhomanenko
Qiantong Xu
Vineel Pratap
Paden Tomasello
Jacob Kahn
Gilad Avidov
R. Collobert
Gabriel Synnaeve
39
98
0
22 Oct 2020
Similarity Analysis of Self-Supervised Speech Representations
Yu-An Chung
Yonatan Belinkov
James R. Glass
SSL
36
36
0
22 Oct 2020
Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition
Qiujia Li
David Qiu
Yu Zhang
Bo-wen Li
Yanzhang He
P. Woodland
Liangliang Cao
Trevor Strohman
12
46
0
22 Oct 2020
A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks
Yun Tang
J. Pino
Changhan Wang
Xutai Ma
Dmitriy Genzel
26
73
0
21 Oct 2020
Towards End-to-End Training of Automatic Speech Recognition for Nigerian Pidgin
Daniel A. Ajisafe
O. Adegboro
Esther Oduntan
T. Arulogun
18
4
0
21 Oct 2020
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition
Yangyang Shi
Yongqiang Wang
Chunyang Wu
Ching-Feng Yeh
Julian Chan
Frank Zhang
Duc Le
M. Seltzer
56
168
0
21 Oct 2020
CLAR: Contrastive Learning of Auditory Representations
Haider Al-Tahan
Y. Mohsenzadeh
SSL
118
56
0
19 Oct 2020
Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions
Ludwig Kurzinger
Nicolas Lindae
Palle Klewitz
Gerhard Rigoll
27
5
0
15 Oct 2020
Viewmaker Networks: Learning Views for Unsupervised Representation Learning
Alex Tamkin
Mike Wu
Noah D. Goodman
SSL
28
64
0
14 Oct 2020
Exploiting Spectral Augmentation for Code-Switched Spoken Language Identification
P. Rangan
Sundeep Teki
Hemant Misra
11
21
0
14 Oct 2020
Towards Data-efficient Modeling for Wake Word Spotting
Yixin Gao
Yuriy Mishchenko
Anish Shah
Spyros Matsoukas
S. Vitaladevuni
52
30
0
13 Oct 2020
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling
Jiahui Yu
Wei Han
Anmol Gulati
Chung-Cheng Chiu
Bo-wen Li
Tara N. Sainath
Yonghui Wu
Ruoming Pang
30
18
0
12 Oct 2020
Improving Low Resource Code-switched ASR using Augmented Code-switched TTS
Yash Sharma
Basil Abraham
Karan Taneja
Preethi Jyothi
14
20
0
12 Oct 2020
Contrastive Representation Learning: A Framework and Review
Phúc H. Lê Khắc
Graham Healy
Alan F. Smeaton
SSL
AI4TS
186
687
0
10 Oct 2020
Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems
Yinghui Huang
H. Kuo
Samuel Thomas
Zvi Kons
Kartik Audhkhasi
Brian Kingsbury
R. Hoory
M. Picheny
VLM
19
63
0
08 Oct 2020
Representation Learning for Sequence Data with Deep Autoencoding Predictive Components
Junwen Bai
Weiran Wang
Yingbo Zhou
Caiming Xiong
SSL
AI4TS
27
12
0
07 Oct 2020
Fine-Grained Grounding for Multimodal Speech Recognition
Tejas Srinivasan
Ramon Sanabria
Florian Metze
Desmond Elliott
25
11
0
05 Oct 2020
Differentiable Weighted Finite-State Transducers
Awni Y. Hannun
Vineel Pratap
Jacob Kahn
Wei-Ning Hsu
33
29
0
02 Oct 2020
A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline
Yerbolat Khassanov
Saida Mussakhojayeva
A. Mirzakhmetov
A. Adiyev
Mukhamet Nurpeiissov
H. A. Varol
22
30
0
22 Sep 2020
Cough Against COVID: Evidence of COVID-19 Signature in Cough Sounds
Piyush Bagad
Aman Dalmia
Jigar Doshi
Arsha Nagrani
Parag Bhamare
A. Mahale
S. Rane
N. Agarwal
R. Panicker
34
112
0
17 Sep 2020
On Multitask Loss Function for Audio Event Detection and Localization
Huy P Phan
L. D. Pham
P. Koch
Ngoc Q. K. Duong
Ian Mcloughlin
Alfred Mertins
21
14
0
11 Sep 2020
On Target Segmentation for Direct Speech Translation
Mattia Antonino Di Gangi
Marco Gaido
Matteo Negri
Marco Turchi
37
14
0
10 Sep 2020
Previous
1
2
3
...
12
13
14
15
Next