Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.08779
Cited By
v1
v2
v3 (latest)
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"
50 / 1,049 papers shown
Title
Training speaker recognition systems with limited data
Nik Vaessen
David A. van Leeuwen
45
6
0
28 Mar 2022
Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition
Guan-Ting Lin
Shang-Wen Li
Hung-yi Lee
TTA
VLM
66
14
0
27 Mar 2022
Data Augmentation Strategies for Improving Sequential Recommender Systems
Jooeun Song
B. Suh
28
9
0
26 Mar 2022
Speech-enhanced and Noise-aware Networks for Robust Speech Recognition
Hung-Shin Lee
Pin-Yuan Chen
Yao-Fei Cheng
Yu Tsao
Hsin-Min Wang
46
1
0
25 Mar 2022
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification
Juncheng Billy Li
Shuhui Qu
Po-Yao (Bernie) Huang
Florian Metze
VLM
102
9
0
25 Mar 2022
Automatic Speech Recognition for Speech Assessment of Persian Preschool Children
Amirhossein Abaskohi
Fatemeh Mortazavi
Hadi Moradi
65
7
0
24 Mar 2022
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation
Chih-Chiang Chang
Hung-yi Lee
90
13
0
22 Mar 2022
Conditional Generative Data Augmentation for Clinical Audio Datasets
Matthias Seibold
A. Hoch
Mazda Farshad
Nassir Navab
Philipp Fürnstahl
MedIm
67
13
0
22 Mar 2022
A Track-Wise Ensemble Event Independent Network for Polyphonic Sound Event Localization and Detection
Jinbo Hu
Yin Cao
Ming Wu
Qiuqiang Kong
Feiran Yang
Mark D. Plumbley
J. Yang
64
23
0
19 Mar 2022
Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation
Beatrice Savoldi
Marco Gaido
L. Bentivogli
Matteo Negri
Marco Turchi
75
27
0
18 Mar 2022
SepTr: Separable Transformer for Audio Spectrogram Processing
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
Fahad Shahbaz Khan
ViT
96
32
0
17 Mar 2022
Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation
Tsz Kin Lam
Shigehiko Schamoni
Stefan Riezler
74
34
0
16 Mar 2022
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
Yuan Gong
Sameer Khurana
Andrew Rouditchenko
James R. Glass
VLM
73
29
0
13 Mar 2022
Spatial Consistency Loss for Training Multi-Label Classifiers from Single-Label Annotations
Thomas Verelst
Paul Kishan Rubenstein
M. Eichner
Tinne Tuytelaars
Maxim Berman
85
20
0
11 Mar 2022
A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification
Qing Wang
Jun Du
Siyuan Zheng
Yunqing Li
Yajian Wang
...
Hu Hu
Chao-Han Huck Yang
Sabato Marco Siniscalchi
Yannan Wang
Chin-Hui Lee
48
2
0
07 Mar 2022
Leveraging Pre-trained BERT for Audio Captioning
Xubo Liu
Xinhao Mei
Qiushi Huang
Jianyuan Sun
Jinzheng Zhao
Haohe Liu
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
115
30
0
06 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDL
AI4TS
SSL
96
11
0
01 Mar 2022
Explainable deepfake and spoofing detection: an attack analysis using SHapley Additive exPlanations
W. Ge
Massimiliano Todisco
Nicholas W. D. Evans
AAML
52
9
0
28 Feb 2022
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
Maja Pantic
VLM
230
152
0
26 Feb 2022
GenéLive! Generating Rhythm Actions in Love Live!
Atsushi Takada
Daichi Yamazaki
Likun Liu
Yudai Yoshida
Nyamkhuu Ganbat
T. Shimotomai
Taiga Yamamoto
Daisuke Sakurai
Naoki Hamada
VLM
69
4
0
25 Feb 2022
Towards Better Meta-Initialization with Task Augmentation for Kindergarten-aged Speech Recognition
Yunzheng Zhu
Ruchao Fan
Abeer Alwan
CLL
80
4
0
24 Feb 2022
Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech
Quan Wang
Yang Yu
Jason W. Pelecanos
Yiling Huang
Ignacio López Moreno
86
15
0
24 Feb 2022
Contrastive-mixup learning for improved speaker verification
Xin Zhang
Minho Jin
R. Cheng
Ruirui Li
Eunjung Han
A. Stolcke
AAML
SSL
60
11
0
22 Feb 2022
VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
Jinhan Wang
Xiaosu Tong
Jinxi Guo
Di He
Roland Maas
71
5
0
22 Feb 2022
S3T: Self-Supervised Pre-training with Swin Transformer for Music Classification
Han Zhao
Chen Zhang
Belei Zhu
Zejun Ma
Ke-jun Zhang
ViT
73
29
0
21 Feb 2022
LPC Augment: An LPC-Based ASR Data Augmentation Algorithm for Low and Zero-Resource Children's Dialects
Alexander Johnson
Ruchao Fan
Robin Morris
Abeer Alwan
45
12
0
19 Feb 2022
Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models
Vrunda N. Sukhadia
S. Umesh
95
8
0
18 Feb 2022
AISHELL-NER: Named Entity Recognition from Chinese Speech
Boli Chen
Guangwei Xu
Xiaobin Wang
Pengjun Xie
Meishan Zhang
Fei Huang
50
31
0
17 Feb 2022
Non-Autoregressive ASR with Self-Conditioned Folded Encoders
Tatsuya Komatsu
122
8
0
17 Feb 2022
Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers
Yotaro Kubo
Shigeki Karita
M. Bacchiani
53
27
0
16 Feb 2022
Conversational Speech Recognition By Learning Conversation-level Characteristics
Kun Wei
Yike Zhang
Sining Sun
Lei Xie
Long Ma
82
9
0
16 Feb 2022
Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models
Sarala Padi
S. O. Sadjadi
Tianyi Zhou
Ram D. Sriram
74
39
0
16 Feb 2022
What Does it Mean for a Language Model to Preserve Privacy?
Hannah Brown
Katherine Lee
Fatemehsadat Mireshghallah
Reza Shokri
Florian Tramèr
PILM
106
243
0
11 Feb 2022
Improving Automatic Speech Recognition for Non-Native English with Transfer Learning and Language Model Decoding
Peter Sullivan
Toshiko Shibano
Muhammad Abdul-Mageed
78
11
0
10 Feb 2022
Neural Architecture Search for Energy Efficient Always-on Audio Models
Daniel T. Speckhard
Karolis Misiunas
Sagi Perel
Tenghui Zhu
S. Carlile
M. Slaney
65
13
0
09 Feb 2022
The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge
Chen Shen
Yi Y. Liu
Wenzhi Fan
Bin Wang
Shi-Xue Wen
Yao Tian
Jun Zhang
Jingsheng Yang
Zejun Ma
51
4
0
09 Feb 2022
Conversational Agents: Theory and Applications
M. Wahde
M. Virgolin
LLMAG
68
26
0
07 Feb 2022
MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances
Tianchi Liu
Rohan Kumar Das
Kong Aik Lee
Haizhou Li
134
72
0
03 Feb 2022
The RoyalFlush System of Speech Recognition for M2MeT Challenge
Shuaishuai Ye
Peiyao Wang
Shunfei Chen
Xinhui Hu
Xinkang Xu
59
5
0
03 Feb 2022
Keyword localisation in untranscribed speech using visually grounded speech models
Kayode Olaleye
Dan Oneaţă
Herman Kamper
63
7
0
02 Feb 2022
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Ke Chen
Xingjian Du
Bilei Zhu
Zejun Ma
Taylor Berg-Kirkpatrick
Shlomo Dubnov
ViT
171
278
0
02 Feb 2022
ColloSSL: Collaborative Self-Supervised Learning for Human Activity Recognition
Yash Jain
Chi Ian Tang
Chulhong Min
F. Kawsar
Akhil Mathur
SSL
100
54
0
01 Feb 2022
BEA-Base: A Benchmark for ASR of Spontaneous Hungarian
P. Mihajlik
A. Balog
T. E. Gráczi
A. Kohári
Balázs Tarján
K. Mády
51
8
0
01 Feb 2022
Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection
Minglun Han
Linhao Dong
Zhenlin Liang
Meng Cai
Shiyu Zhou
Zejun Ma
Bo Xu
80
46
0
30 Jan 2022
Sentiment-Aware Automatic Speech Recognition pre-training for enhanced Speech Emotion Recognition
Ayoub Ghriss
Bo Yang
Viktor Rozgic
Elizabeth Shriberg
Chao Wang
87
21
0
27 Jan 2022
Recency Dropout for Recurrent Recommender Systems
Bo-Yu Chang
Can Xu
Matt Le
Jingchen Feng
Ya Le
Sriraj Badam
Ed H. Chi
Minmin Chen
57
3
0
26 Jan 2022
Run-and-back stitch search: novel block synchronous decoding for streaming encoder-decoder ASR
E. Tsunoo
Chaitanya Narisetty
Michael Hentschel
Yosuke Kashiwagi
Shinji Watanabe
48
2
0
25 Jan 2022
NAS-VAD: Neural Architecture Search for Voice Activity Detection
Daniel Rho
Jinhyeok Park
J. Ko
80
6
0
22 Jan 2022
Supervised and Self-supervised Pretraining Based COVID-19 Detection Using Acoustic Breathing/Cough/Speech Signals
Xing-Yu Chen
Qiu-shi Zhu
Jie Zhang
Lirong Dai
65
15
0
22 Jan 2022
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection
A. Haliassos
Rodrigo Mira
Stavros Petridis
Maja Pantic
CVBM
121
133
0
18 Jan 2022
Previous
1
2
3
...
9
10
11
...
19
20
21
Next