Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.08779
Cited By
v1
v2
v3 (latest)
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"
50 / 1,048 papers shown
Title
An Empirical Survey of Data Augmentation for Time Series Classification with Neural Networks
Brian Kenji Iwana
S. Uchida
AI4TS
90
506
0
31 Jul 2020
Semi-Supervised Learning with Data Augmentation for End-to-End ASR
F. Weninger
F. Mana
R. Gemello
Jesús Andrés-Ferrer
P. Zhan
88
30
0
27 Jul 2020
Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition
Jinxi Guo
Gautam Tiwari
J. Droppo
Maarten Van Segbroeck
Che-Wei Huang
A. Stolcke
Roland Maas
71
55
0
27 Jul 2020
CoVoST 2 and Massively Multilingual Speech-to-Text Translation
Changhan Wang
Anne Wu
J. Pino
SLR
93
75
0
20 Jul 2020
Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization
Jenthe Thienpondt
Brecht Desplanques
Kris Demuynck
67
24
0
15 Jul 2020
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
Andy T. Liu
Shang-Wen Li
Hung-yi Lee
SSL
178
362
0
12 Jul 2020
Data augmentation enhanced speaker enrollment for text-dependent speaker verification
A. K. Sarkar
H. Sarma
Priyanka Dwivedi
Zheng-Hua Tan
24
3
0
12 Jul 2020
Class LM and word mapping for contextual biasing in End-to-End ASR
Rongqing Huang
Ossama Abdel-Hamid
Xinwei Li
G. Evermann
57
49
0
10 Jul 2020
Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters
Vineel Pratap
Anuroop Sriram
Paden Tomasello
Awni Y. Hannun
Vitaliy Liptchinsky
Gabriel Synnaeve
R. Collobert
91
143
0
06 Jul 2020
Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning
Pavel Denisov
Ngoc Thang Vu
77
30
0
03 Jul 2020
Data Augmenting Contrastive Learning of Speech Representations in the Time Domain
Eugene Kharitonov
M. Rivière
Gabriel Synnaeve
Lior Wolf
Pierre-Emmanuel Mazaré
Matthijs Douze
Emmanuel Dupoux
137
118
0
02 Jul 2020
Polyphonic sound event detection based on convolutional recurrent neural networks with semi-supervised loss function for DCASE challenge 2020 task 4
Nam Kyun Kim
Hyeongju Kim
60
3
0
02 Jul 2020
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
198
375
0
29 Jun 2020
Real Time Speech Enhancement in the Waveform Domain
Alexandre Défossez
Gabriel Synnaeve
Yossi Adi
109
466
0
23 Jun 2020
Self-Supervised Representations Improve End-to-End Speech Translation
Anne Wu
Changhan Wang
J. Pino
Jiatao Gu
SSL
110
40
0
22 Jun 2020
Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net
Kazuki Shimada
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
121
19
0
22 Jun 2020
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients
Chenfei Zhu
Yu Cheng
Zhe Gan
Furong Huang
Jingjing Liu
Tom Goldstein
ODL
113
2
0
21 Jun 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
325
5,878
0
20 Jun 2020
Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition
Xinyuan Zhou
Emre Yilmaz
Yanhua Long
Yijie Li
Haizhou Li
80
52
0
18 Jun 2020
Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-based LVCSR
Xinyuan Zhou
Grandee Lee
Emre Yilmaz
Yanhua Long
Jiaen Liang
Haizhou Li
65
7
0
18 Jun 2020
Are you wearing a mask? Improving mask detection from speech using augmentation by cycle-consistent GANs
Nicolae-Cuatualin Ristea
Radu Tudor Ionescu
CVBM
93
41
0
17 Jun 2020
The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge
Ashish Arora
Desh Raj
Aswin Shanmugam Subramanian
Ke Li
Bar Ben Yair
Matthew Maciejewski
Piotr Żelasko
Leibny Paola García-Perera
Shinji Watanabe
Sanjeev Khudanpur
147
9
0
14 Jun 2020
Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation
Changhan Wang
J. Pino
Jiatao Gu
79
30
0
09 Jun 2020
End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020
Marco Gaido
Mattia Antonino Di Gangi
Matteo Negri
Marco Turchi
91
54
0
04 Jun 2020
Contextual RNN-T For Open Domain ASR
Mahaveer Jain
Gil Keren
Jay Mahadeokar
Geoffrey Zweig
Florian Metze
Yatharth Saraf
63
104
0
04 Jun 2020
Is 42 the Answer to Everything in Subtitling-oriented Speech Translation?
Alina Karakanta
Matteo Negri
Marco Turchi
89
35
0
01 Jun 2020
High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder
Kazi Nazmul Haque
R. Rana
Björn W Schuller
DRL
100
12
0
01 Jun 2020
CLOCS: Contrastive Learning of Cardiac Signals Across Space, Time, and Patients
Dani Kiyasseh
T. Zhu
David Clifton
120
195
0
27 May 2020
Insertion-Based Modeling for End-to-End Automatic Speech Recognition
Yuya Fujita
Shinji Watanabe
Motoi Omachi
Xuankai Chan
80
31
0
27 May 2020
ACGAN-based Data Augmentation Integrated with Long-term Scalogram for Acoustic Scene Classification
Hangting Chen
Zuozhen Liu
Zongming Liu
Pengyuan Zhang
32
8
0
27 May 2020
Multistream CNN for Robust Acoustic Modeling
Kyu Jeong Han
Jing Pan
Venkata Krishna Naveen Tadala
T. Ma
Daniel Povey
71
34
0
21 May 2020
ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition
Jing Pan
Joshua Shapiro
Jeremy Wohlwend
Kyu Jeong Han
Tao Lei
T. Ma
72
22
0
21 May 2020
Simplified Self-Attention for Transformer-based End-to-End Speech Recognition
Haoneng Luo
Shiliang Zhang
Ming Lei
Lei Xie
128
34
0
21 May 2020
Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition
Shiliang Zhang
Zhifu Gao
Haoneng Luo
Ming Lei
Jie Ying Gao
Zhijie Yan
Lei Xie
64
29
0
21 May 2020
SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition
Zhifu Gao
Shiliang Zhang
Ming Lei
Ian Mcloughlin
81
35
0
21 May 2020
Training Keyword Spotting Models on Non-IID Data with Federated Learning
Andrew Straiton Hard
Kurt Partridge
Cameron Nguyen
Niranjan A. Subrahmanya
Aishanee Shah
Pai Zhu
Ignacio López Moreno
Rajiv Mathews
OOD
FedML
74
67
0
21 May 2020
A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Linhao Dong
Cheng Yi
Jianzong Wang
Shiyu Zhou
Shuang Xu
X. Jia
Bo Xu
68
17
0
20 May 2020
Early Stage LM Integration Using Local and Global Log-Linear Combination
Wilfried Michel
Ralf Schluter
Hermann Ney
60
11
0
20 May 2020
Relative Positional Encoding for Speech Recognition and Direct Translation
Ngoc-Quan Pham
Thanh-Le Ha
Tuan-Nam Nguyen
T. Nguyen
Elizabeth Salesky
S. Stueker
Jan Niehues
A. Waibel
56
37
0
20 May 2020
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs
Yongkweon Jeon
Baeseong Park
S. Kwon
Byeongwook Kim
Jeongin Yun
Dongsoo Lee
MQ
63
31
0
20 May 2020
Improved Noisy Student Training for Automatic Speech Recognition
Daniel S. Park
Yu Zhang
Ye Jia
Wei Han
Chung-Cheng Chiu
Yue Liu
Yonghui Wu
Quoc V. Le
119
243
0
19 May 2020
A systematic comparison of grapheme-based vs. phoneme-based label units for encoder-decoder-attention models
Mohammad Zeineldeen
Albert Zeyer
Wei Zhou
T. Ng
Ralf Schluter
Hermann Ney
71
2
0
19 May 2020
A New Training Pipeline for an Improved Neural Transducer
Albert Zeyer
André Merboldt
Ralf Schluter
Hermann Ney
AI4TS
MedIm
75
52
0
19 May 2020
Iterative Pseudo-Labeling for Speech Recognition
Qiantong Xu
Tatiana Likhomanenko
Jacob Kahn
Awni Y. Hannun
Gabriel Synnaeve
R. Collobert
VLM
96
134
0
19 May 2020
Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict
Yosuke Higuchi
Shinji Watanabe
Nanxin Chen
Tetsuji Ogawa
Tetsunori Kobayashi
65
139
0
18 May 2020
Attention-based Transducer for Online Speech Recognition
Bin Wang
Yan Yin
Hui-Ching Lin
67
4
0
18 May 2020
The NTNU System at the Interspeech 2020 Non-Native Children's Speech ASR Challenge
Tien-Hong Lo
Fu-An Chao
Shi-Yan Weng
Berlin Chen
55
11
0
18 May 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
231
3,179
0
16 May 2020
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition
Zhengkun Tian
Jiangyan Yi
J. Tao
Ye Bai
Shuai Zhang
Zhengqi Wen
99
54
0
16 May 2020
Large scale weakly and semi-supervised learning for low-resource video ASR
Kritika Singh
Vimal Manohar
Alex Xiao
Sergey Edunov
Ross B. Girshick
Vitaliy Liptchinsky
Christian Fuegen
Yatharth Saraf
Geoffrey Zweig
Abdel-rahman Mohamed
77
9
0
16 May 2020
Previous
1
2
3
...
18
19
20
21
Next