Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.08779
Cited By
v1
v2
v3 (latest)
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"
50 / 1,048 papers shown
Title
SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations
Ioannis Tsiamas
José A. R. Fonollosa
Marta R. Costa-jussá
90
6
0
19 Dec 2022
Mu
2
^{2}
2
SLAM: Multitask, Multilingual Speech and Language Models
Yong Cheng
Yu Zhang
Melvin Johnson
Wolfgang Macherey
Ankur Bapna
66
8
0
19 Dec 2022
WACO: Word-Aligned Contrastive Learning for Speech Translation
Siqi Ouyang
Rong Ye
Lei Li
104
28
0
19 Dec 2022
BEATs: Audio Pre-Training with Acoustic Tokenizers
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
124
299
0
18 Dec 2022
A Review of Speech-centric Trustworthy Machine Learning: Privacy, Safety, and Fairness
Tiantian Feng
Rajat Hebbar
Nicholas Mehlman
Xuan Shi
Aditya Kommineni
and Shrikanth Narayanan
108
34
0
18 Dec 2022
Feature Dropout: Revisiting the Role of Augmentations in Contrastive Learning
Alex Tamkin
Margalit Glasgow
Xiluo He
Noah D. Goodman
SSL
118
7
0
16 Dec 2022
MAViL: Masked Audio-Video Learners
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
83
54
0
15 Dec 2022
Attention as a Guide for Simultaneous Speech Translation
Sara Papi
Matteo Negri
Marco Turchi
93
31
0
15 Dec 2022
Disentangling Prosody Representations with Unsupervised Speech Reconstruction
Leyuan Qu
Taiha Li
C. Weber
Theresa Pekarek-Rosin
F. Ren
S. Wermter
87
10
0
14 Dec 2022
Towards trustworthy phoneme boundary detection with autoregressive model and improved evaluation metric
Hyeongju Kim
Hyeong-Seok Choi
43
2
0
13 Dec 2022
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
118
45
0
09 Dec 2022
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Jinze Bai
Rui Men
Han Yang
Xuancheng Ren
Kai Dang
...
Wenhang Ge
Jianxin Ma
Junyang Lin
Jingren Zhou
Chang Zhou
88
16
0
08 Dec 2022
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
230
3,780
0
06 Dec 2022
Label-free Knowledge Distillation with Contrastive Loss for Light-weight Speaker Recognition
Zhiyuan Peng
Xuanji He
Ke Ding
Tan Lee
Guanglu Wan
62
6
0
06 Dec 2022
Automatic Anomalies Detection in Hydraulic Devices
José A. Solorio
José M. García
Sudip Vhaduri
23
0
0
05 Dec 2022
LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition
Yuguang Yang
Yu Pan
Jingjing Yin
Heng Lu
103
3
0
05 Dec 2022
Towards Generating Diverse Audio Captions via Adversarial Training
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
88
2
0
05 Dec 2022
Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data
Yuhao Zhang
Chen Xu
Bojie Hu
Chunliang Zhang
Tong Xiao
Jingbo Zhu
66
16
0
04 Dec 2022
SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition
Yichong Leng
Xu Tan
Wenjie Liu
Kaitao Song
Rui Wang
Xiang-Yang Li
Tao Qin
Ed Lin
Tie-Yan Liu
120
16
0
02 Dec 2022
An Overview of Indian Spoken Language Recognition from Machine Learning Perspective
Spandan Dey
Md. Sahidullah
G. Saha
63
20
0
30 Nov 2022
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Xiaohuan Zhou
Jiaming Wang
Zeyu Cui
Shiliang Zhang
Zhijie Yan
Jingren Zhou
Chang Zhou
93
12
0
29 Nov 2022
Interpretability Analysis of Deep Models for COVID-19 Detection
Daniel Peixoto Pinto da Silva
Edresson Casanova
L. Gris
A. Júnior
Marcelo Finger
...
Beatriz Raposo
Marcus Martins
S. Aluísio
L. Berti
João Paulo Teixeira
60
3
0
25 Nov 2022
Learning General Audio Representations with Large-Scale Training of Patchout Audio Transformers
Khaled Koutini
Shahed Masoudian
Florian Schmid
Hamid Eghbalzadeh
Jan Schluter
Gerhard Widmer
147
6
0
25 Nov 2022
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification
Sara Atito
Muhammad Awais
Wenwu Wang
Mark D. Plumbley
J. Kittler
ViT
71
11
0
23 Nov 2022
ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English
Injy Hamed
Nizar Habash
Slim Abdennadher
Ngoc Thang Vu
60
13
0
22 Nov 2022
SSCFormer: Push the Limit of Chunk-wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution
Fangyuan Wang
Bo Xu
Bo Xu
102
0
0
21 Nov 2022
LISA: Localized Image Stylization with Audio via Implicit Neural Representation
Seung Hyun Lee
Chanyoung Kim
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
60
3
0
21 Nov 2022
Impact of visual assistance for automated audio captioning
Wim Boes
Hugo Van hamme
65
1
0
18 Nov 2022
Adaptive Representations of Sound for Automatic Insect Recognition
Marius Faiss
40
10
0
17 Nov 2022
LongFNT: Long-form Speech Recognition with Factorized Neural Transducer
Xun Gong
Yu-Huan Wu
Jinyu Li
Shujie Liu
Rui Zhao
Xie Chen
Y. Qian
RALM
67
11
0
17 Nov 2022
Balanced Deep CCA for Bird Vocalization Detection
Sumit Kumar
B. Anshuman
Linus Ruettimann
Richard Hans Robert Hahnloser
Vipul Arora
15
2
0
17 Nov 2022
Unsupervised Model-based speaker adaptation of end-to-end lattice-free MMI model for speech recognition
Xurong Xie
Xunying Liu
Hui Chen
Hongan Wang
78
1
0
17 Nov 2022
Improving Speech Emotion Recognition with Unsupervised Speaking Style Transfer
Leyuan Qu
Wei Wang
C. Weber
F. Ren
Taiha Li
S. Wermter
44
1
0
16 Nov 2022
Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments
Dominik Wagner
Ilja Baumann
Sebastian P. Bayerl
Korbinian Riedhammer
Tobias Bocklet
77
2
0
16 Nov 2022
On Unsupervised Uncertainty-Driven Speech Pseudo-Label Filtering and Model Calibration
Nauman Dawalatabad
Sameer Khurana
Antoine Laurent
James R. Glass
35
3
0
14 Nov 2022
Improving Children's Speech Recognition by Fine-tuning Self-supervised Adult Speech Representations
Renée Lu
M. Shahin
Beena Ahmed
57
4
0
14 Nov 2022
Continuous Soft Pseudo-Labeling in ASR
Tatiana Likhomanenko
R. Collobert
Navdeep Jaitly
Samy Bengio
VLM
79
3
0
11 Nov 2022
A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding
Yifan Peng
Siddhant Arora
Yosuke Higuchi
Yushi Ueda
Sujay S. Kumar
Karthik Ganesan
Siddharth Dalmia
Xuankai Chang
Shinji Watanabe
80
21
0
10 Nov 2022
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
Florian Schmid
Khaled Koutini
Gerhard Widmer
ViT
86
60
0
09 Nov 2022
Pushing the limits of self-supervised speaker verification using regularized distillation framework
Yafeng Chen
Siqi Zheng
Haibo Wang
Luyao Cheng
Qian Chen
75
27
0
08 Nov 2022
High-resolution embedding extractor for speaker diarisation
Hee-Soo Heo
Youngki Kwon
Bong-Jin Lee
You Jin Kim
Jee-weon Jung
70
5
0
08 Nov 2022
Breaking the trade-off in personalized speech enhancement with cross-task knowledge distillation
H. Taherian
Sefik Emre Eskimez
Takuya Yoshioka
55
1
0
05 Nov 2022
Improved Techniques for the Conditional Generative Augmentation of Clinical Audio Data
Mane Margaryan
Matthias Seibold
Indu Joshi
Mazda Farshad
Philipp Fürnstahl
Nassir Navab
MedIm
62
2
0
05 Nov 2022
Dynamic Kernels and Channel Attention for Low Resource Speaker Verification
A. Ollerenshaw
Md. Asif Jalal
Thomas Hain
21
0
0
03 Nov 2022
Probing Statistical Representations For End-To-End ASR
A. Ollerenshaw
Md. Asif Jalal
Thomas Hain
57
2
0
03 Nov 2022
The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results
Ao Zhang
F. Yu
Kaixun Huang
Linfu Xie
Longbiao Wang
Eng Siong Chng
Hui Bu
Binbin Zhang
Wei Chen
Xin Xu
93
5
0
03 Nov 2022
Monolingual Recognizers Fusion for Code-switching Speech Recognition
Tongtong Song
Qiang Xu
Haoyu Lu
Longbiao Wang
Hao Shi
Yuqin Lin
Yanbing Yang
Jianwu Dang
71
4
0
02 Nov 2022
Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers
Duc Le
Frank Seide
Yuhao Wang
Yongbin Li
Kjell Schubert
Ozlem Kalinli
M. Seltzer
75
6
0
02 Nov 2022
Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022
Zhengyang Chen
Bing Han
Xu Xiang
Houjun Huang
Bei Liu
Y. Qian
91
14
0
02 Nov 2022
BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder
Yosuke Higuchi
Tetsuji Ogawa
Tetsunori Kobayashi
Shinji Watanabe
169
13
0
02 Nov 2022
Previous
1
2
3
4
5
6
...
19
20
21
Next