Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.08779
Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"
50 / 747 papers shown
Title
G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR
Gary Wang
Ekin D.Cubuk
Andrew Rosenberg
Shuyang Cheng
Ron J. Weiss
Bhuvana Ramabhadran
Pedro J. Moreno
Quoc V. Le
Daniel S. Park
35
1
0
19 Oct 2022
Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Zhehuai Chen
Ankur Bapna
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Pedro J. Moreno
Nanxin Chen
51
17
0
18 Oct 2022
SVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution Learning
Zuheng Kang
Jianzong Wang
Junqing Peng
Jing Xiao
26
3
0
18 Oct 2022
Robust, General, and Low Complexity Acoustic Scene Classification Systems and An Effective Visualization for Presenting a Sound Scene Context
L. D. Pham
Dusan Salovic
Anahid N. Jalali
Alexander Schindler
Khoa Tran
H. Vu
Phu X. Nguyen
35
5
0
16 Oct 2022
A Policy-based Approach to the SpecAugment Method for Low Resource E2E ASR
Rui Li
Guodong Ma
Dexin Zhao
Ranran Zeng
Xiaoyu Li
Haolin Huang
34
2
0
16 Oct 2022
LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge
Yan Jia
Mihee Hong
Jingyu Hou
Kailong Ren
Sifan Ma
Jin Wang
Fangzhen Peng
Yinglin Ji
Lin Yang
Junjie Wang
25
1
0
14 Oct 2022
JOIST: A Joint Speech and Text Streaming Model For ASR
Tara N. Sainath
Rohit Prabhavalkar
Ankur Bapna
Yu Zhang
Zhouyuan Huo
Zhehuai Chen
Yue Liu
Weiran Wang
Trevor Strohman
RALM
AuLLM
53
35
0
13 Oct 2022
Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score Fusion
Yuxiang Zhang
Jingze Lu
Xingming Wang
Zhuo Li
Runqiu Xiao
Wenchao Wang
Ming Li
Pengyuan Zhang
51
5
0
13 Oct 2022
An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition
Chao-Han Huck Yang
Jun Qi
Sabato Marco Siniscalchi
Chin-Hui Lee
31
4
0
12 Oct 2022
Cross-dataset COVID-19 Transfer Learning with Cough Detection, Cough Segmentation, and Data Augmentation
Bagus Tris Atmaja
Zanjabila
Suyanto
A. Sasou
32
1
0
12 Oct 2022
Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR
DongSeon Hwang
K. Sim
Yu Zhang
Trevor Strohman
27
10
0
11 Oct 2022
Scaling Up Deliberation for Multilingual ASR
Ke Hu
Yue Liu
Tara N. Sainath
LRM
33
9
0
11 Oct 2022
Automated Audio Captioning via Fusion of Low- and High- Dimensional Features
Jianyuan Sun
Xubo Liu
Xinhao Mei
Mark D. Plumbley
V. Kılıç
Wenwu Wang
33
3
0
10 Oct 2022
CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations
Vasista Sai Lodagala
Sreyan Ghosh
S. Umesh
SSL
51
18
0
05 Oct 2022
JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT
Mayumi Ohta
Julia Kreutzer
Stefan Riezler
19
0
0
05 Oct 2022
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild
Xuechen Liu
Xin Wang
Md. Sahidullah
J. Patino
Héctor Delgado
...
Massimiliano Todisco
Junichi Yamagishi
Nicholas W. D. Evans
A. Nautsch
Kong Aik Lee
49
173
0
05 Oct 2022
Learning Temporal Resolution in Spectrogram for Audio Classification
Haohe Liu
Xubo Liu
Qiuqiang Kong
Wenwu Wang
Mark D. Plumbley
36
7
0
04 Oct 2022
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
61
105
0
30 Sep 2022
An empirical study of weakly supervised audio tagging embeddings for general audio representations
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
43
1
0
30 Sep 2022
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition
Martin H. Radfar
Rohit Barnwal
Rupak Vignesh Swaminathan
Feng-Ju Chang
Grant P. Strimel
Nathan Susanj
Athanasios Mouchtaris
42
13
0
29 Sep 2022
Direct Speech Translation for Automatic Subtitling
Sara Papi
Marco Gaido
Alina Karakanta
Mauro Cettolo
Matteo Negri
Marco Turchi
59
11
0
27 Sep 2022
Unsupervised domain adaptation for speech recognition with unsupervised error correction
Long Mai
Julie Carson-Berndsen
46
8
0
24 Sep 2022
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
29
11
0
20 Sep 2022
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition
Ye Bai
Jie Li
W. Han
Hao Ni
Kaituo Xu
Zhuo Zhang
Cheng Yi
Xiaorui Wang
MoE
36
1
0
17 Sep 2022
Adaptive Natural Language Generation for Task-oriented Dialogue via Reinforcement Learning
Atsumoto Ohashi
Ryuichiro Higashinaka
OffRL
36
6
0
16 Sep 2022
Self-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal Bursts
Vincent Karas
Andreas Triantafyllopoulos
Meishu Song
Björn W. Schuller
38
4
0
15 Sep 2022
A Universally-Deployable ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation
Tom O'Malley
A. Narayanan
Quan Wang
27
5
0
14 Sep 2022
I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization
Dianwen Ng
J. Yip
Tanmay Surana
Zhao Yang
Chong Zhang
Yukun Ma
Chongjia Ni
Chng Eng Siong
B. Ma
40
6
0
14 Sep 2022
Streaming Target-Speaker ASR with Neural Transducer
Takafumi Moriya
Hiroshi Sato
Tsubasa Ochiai
Marc Delcroix
T. Shinozaki
39
21
0
09 Sep 2022
Equivariant Self-Supervision for Musical Tempo Estimation
Elio Quinton
42
9
0
03 Sep 2022
Random Text Perturbations Work, but not Always
Zhengxiang Wang
DeLMO
12
1
0
02 Sep 2022
Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances
Chang Zeng
Xiaoxiao Miao
Xin Wang
Erica Cooper
Junichi Yamagishi
37
6
0
01 Sep 2022
Attention Enhanced Citrinet for Speech Recognition
Xianchao Wu
23
1
0
01 Sep 2022
Deep Sparse Conformer for Speech Recognition
Xianchao Wu
30
2
0
01 Sep 2022
Robust Sound-Guided Image Manipulation
Seung Hyun Lee
Gyeongrok Oh
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
DiffM
26
7
0
30 Aug 2022
Improving Natural-Language-based Audio Retrieval with Transfer Learning and Audio & Text Augmentations
Paul Primus
Gerhard Widmer
29
6
0
24 Aug 2022
Improved Zero-Shot Audio Tagging & Classification with Patchout Spectrogram Transformers
Paul Primus
Gerhard Widmer
VLM
27
5
0
24 Aug 2022
A differentiable short-time Fourier transform with respect to the window length
Maxime Leiber
Axel Barrau
Y. Marnissi
D. Abboud
17
8
0
23 Aug 2022
A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective
Chanwoo Park
Sangdoo Yun
Sanghyuk Chun
AAML
25
32
0
21 Aug 2022
Disentangled Speaker Representation Learning via Mutual Information Minimization
Sung Hwan Mun
Mingrui Han
Minchan Kim
Dongjune Lee
N. Kim
DRL
43
9
0
17 Aug 2022
Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition
A. Andrusenko
R. Nasretdinov
A. Romanenko
22
18
0
16 Aug 2022
An investigation on selecting audio pre-trained models for audio captioning
Peiran Yan
Sheng-Wei Li
26
0
0
12 Aug 2022
A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation
L. T. Nguyen
Nguyen Luong Tran
Long Doan
Manh Luong
Dat Quoc Nguyen
29
4
0
08 Aug 2022
SSDPT: Self-Supervised Dual-Path Transformer for Anomalous Sound Detection in Machine Condition Monitoring
Jisheng Bai
Jianfeng Chen
Mou Wang
Muhammad Saad Ayub
Qingli Yan
54
15
0
06 Aug 2022
Learning a Dual-Mode Speech Recognition Model via Self-Pruning
Chunxi Liu
Yuan Shangguan
Haichuan Yang
Yangyang Shi
Raghuraman Krishnamoorthi
Ozlem Kalinli
SSL
34
7
0
25 Jul 2022
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Xiaoming Ren
Huifeng Zhu
Liuwei Wei
Minghui Wu
Jie Hao
40
9
0
24 Jul 2022
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Grant Van Horn
Rui Qian
Kimberly Wilber
Hartwig Adam
Oisin Mac Aodha
Serge Belongie
36
10
0
21 Jul 2022
Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription
Longshen Ou
Xiangming Gu
Ye Wang
32
21
0
20 Jul 2022
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Gopinath Chennupati
Milind Rao
Gurpreet Chadha
Aaron Eakin
A. Raju
...
Andrew Oberlin
Buddha Nandanoor
Prahalad Venkataramanan
Zheng Wu
Pankaj Sitpure
CLL
29
8
0
19 Jul 2022
Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition
Xun Gong
Zhikai Zhou
Y. Qian
20
3
0
15 Jul 2022
Previous
1
2
3
...
5
6
7
...
13
14
15
Next