Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.08779
Cited By
v1
v2
v3 (latest)
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"
50 / 1,049 papers shown
Title
Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism
Kun Wei
Pengcheng Guo
Ning Jiang
84
11
0
02 Jul 2022
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition
Guangzhi Sun
Chuxu Zhang
P. Woodland
64
14
0
02 Jul 2022
UserLibri: A Dataset for ASR Personalization Using Only Text
Theresa Breiner
Swaroop Indra Ramaswamy
Ehsan Variani
Shefali Garg
Rajiv Mathews
K. Sim
Kilol Gupta
Mingqing Chen
Lara McConnaughey
70
16
0
02 Jul 2022
Exploring Temporally Dynamic Data Augmentation for Video Recognition
Taeoh Kim
Jinhyung Kim
Minho Shim
Sangdoo Yun
Myunggu Kang
Dongyoon Wee
Sangyoun Lee
AI4TS
118
10
0
30 Jun 2022
Improving Deliberation by Text-Only and Semi-Supervised Training
Ke Hu
Tara N. Sainath
Yanzhang He
Rohit Prabhavalkar
Trevor Strohman
S. Mavandadi
Weiran Wang
85
13
0
29 Jun 2022
Language-specific Characteristic Assistance for Code-switching Speech Recognition
Tongtong Song
Qiang Xu
Meng Ge
Longbiao Wang
Hao Shi
Yongjie Lv
Yuqin Lin
Jianwu Dang
84
27
0
29 Jun 2022
Contextual Density Ratio for Language Model Biasing of Sequence to Sequence ASR Systems
Jesús Andrés-Ferrer
Dario Albesano
P. Zhan
Paul Vozila
58
6
0
29 Jun 2022
Data augmentation for learning predictive models on EEG: a systematic comparison
Cédric Rommel
Joseph Paillard
Thomas Moreau
Alexandre Gramfort
118
65
0
29 Jun 2022
STOP: A dataset for Spoken Task Oriented Semantic Parsing
Paden Tomasello
Akshat Shrivastava
Daniel Lazar
Po-Chun Hsu
Duc Le
...
Robin Algayres
Tu Nguyen
Emmanuel Dupoux
Luke Zettlemoyer
Abdel-rahman Mohamed
69
37
0
29 Jun 2022
QTI Submission to DCASE 2021: residual normalization for device-imbalanced acoustic scene classification with efficient design
Byeonggeun Kim
Seunghan Yang
Jangho Kim
Simyung Chang
81
58
0
28 Jun 2022
Challenges and Opportunities in Multi-device Speech Processing
G. Ciccarelli
Jarred Barber
A. Nair
Israel Cohen
Tao Zhang
57
5
0
27 Jun 2022
Data Augmentation for Dementia Detection in Spoken Language
Anna Hlédiková
Dominika Woszczyk
Alican Acman
Soteris Demetriou
Björn Schuller
70
13
0
26 Jun 2022
On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring Mode
Raviraj Joshi
Subodh Kumar
75
2
0
26 Jun 2022
Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation
Jinxian Liu
Chen Ju
Weidi Xie
Ya Zhang
70
39
0
26 Jun 2022
Data Augmentation techniques in time series domain: A survey and taxonomy
Guillermo Iglesias
Edgar Talavera
Ángel González-Prieto
Alberto Mozo
S. Gómez-Canaval
AI4TS
109
171
0
25 Jun 2022
Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes
Byeongil Ko
Hyeonuk Nam
Seong-Hu Kim
D. Min
Seung-Deok Choi
Yong-Hwa Park
77
7
0
24 Jun 2022
Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
Jiajun Deng
Xurong Xie
Tianzi Wang
Mingyu Cui
Boyang Xue
Zengrui Jin
Mengzhe Geng
Guinan Li
Xunying Liu
Helen M. Meng
58
13
0
24 Jun 2022
Pruned RNN-T for fast, memory-efficient ASR training
Fangjun Kuang
Liyong Guo
Wei Kang
Long Lin
Mingshuang Luo
Zengwei Yao
Daniel Povey
101
69
0
23 Jun 2022
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems
Mingyu Cui
Jiajun Deng
Shoukang Hu
Xurong Xie
Tianzi Wang
Shujie Hu
Mengzhe Geng
Boyang Xue
Xunying Liu
Helen M. Meng
80
9
0
23 Jun 2022
M&M Mix: A Multimodal Multiview Transformer Ensemble
Xuehan Xiong
Anurag Arnab
Arsha Nagrani
Cordelia Schmid
ViT
70
20
0
20 Jun 2022
Boosting Cross-Domain Speech Recognition with Self-Supervision
Hanjing Zhu
Gaofeng Cheng
Jindong Wang
Wenxin Hou
Pengyuan Zhang
Yonghong Yan
100
16
0
20 Jun 2022
Redundancy Reduction Twins Network: A Training framework for Multi-output Emotion Regression
Xin Jing
Meishu Song
Andreas Triantafyllopoulos
Zijiang Yang
Björn W. Schuller
39
8
0
18 Jun 2022
Event-related data conditioning for acoustic event classification
Yuanbo Hou
Dick Botteldooren
59
3
0
16 Jun 2022
Investigating Multi-Feature Selection and Ensembling for Audio Classification
Muhammad Turab
Teerath Kumar
Malika Bendechache
Takfarinas Saber
71
41
0
15 Jun 2022
TriHorn-Net: A Model for Accurate Depth-Based 3D Hand Pose Estimation
Mohammad Rezaei
R. Rastgoo
V. Athitsos
3DH
69
36
0
14 Jun 2022
COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion Recognition
M. Tellamekala
Shahin Amiriparian
Björn W. Schuller
Elisabeth André
T. Giesbrecht
Michel Valstar
128
26
0
12 Jun 2022
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
Sang-gil Lee
Ming-Yu Liu
Boris Ginsburg
Bryan Catanzaro
Sung-Hoon Yoon
165
255
0
09 Jun 2022
Revisiting End-to-End Speech-to-Text Translation From Scratch
Biao Zhang
Barry Haddow
Rico Sennrich
92
39
0
09 Jun 2022
LegoNN: Building Modular Encoder-Decoder Models
Siddharth Dalmia
Dmytro Okhonko
M. Lewis
Sergey Edunov
Shinji Watanabe
Florian Metze
Luke Zettlemoyer
Abdel-rahman Mohamed
AuLLM
MoE
73
14
0
07 Jun 2022
Lip-Listening: Mixing Senses to Understand Lips using Cross Modality Knowledge Distillation for Word-Based Models
Hadeel Mabrouk
Omar Abugabal
Nourhan Sakr
Hesham M. Eraqi
VLM
68
2
0
05 Jun 2022
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
Jinchuan Tian
Jianwei Yu
Chunlei Zhang
Chao Weng
Yuexian Zou
Dong Yu
AuLLM
85
25
0
05 Jun 2022
Automated Audio Captioning with Epochal Difficult Captions for Curriculum Learning
Andrew Koh
Soham Dinesh Tiwari
Chng Eng Siong
53
1
0
04 Jun 2022
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Sehoon Kim
A. Gholami
Albert Eaton Shaw
Nicholas Lee
K. Mangalam
Jitendra Malik
Michael W. Mahoney
Kurt Keutzer
130
105
0
02 Jun 2022
Do self-supervised speech models develop human-like perception biases?
Juliette Millet
Ewan Dunbar
SSL
68
23
0
31 May 2022
Speech Augmentation Based Unsupervised Learning for Keyword Spotting
Jian Luo
Jianzong Wang
Ning Cheng
Haobin Tang
Jing Xiao
SSL
72
2
0
28 May 2022
Contrastive Siamese Network for Semi-supervised Speech Recognition
S. Khorram
Jaeyoung Kim
Anshuman Tripathi
Han Lu
Qian Zhang
Hasim Sak
SSL
79
12
0
27 May 2022
Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR
Qiu-shi Zhu
Jie Zhang
Zitian Zhang
Lirong Dai
90
15
0
26 May 2022
DT-SV: A Transformer-based Time-domain Approach for Speaker Verification
Nan Zhang
Jianzong Wang
Zhenhou Hong
Chendong Zhao
Xiaoyang Qu
Jing Xiao
116
5
0
26 May 2022
Improving CTC-based ASR Models with Gated Interlayer Collaboration
Yuting Yang
Yuke Li
Binbin Du
100
11
0
25 May 2022
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech
Alexis Conneau
Min Ma
Simran Khanuja
Yu Zhang
Vera Axelrod
Siddharth Dalmia
Jason Riesa
Clara E. Rivera
Ankur Bapna
VLM
162
332
0
25 May 2022
Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition
Yuting Yang
Binbin Du
Yuke Li
71
1
0
24 May 2022
Non-Parametric Domain Adaptation for End-to-End Speech Translation
Yichao Du
Weizhi Wang
Zhirui Zhang
Boxing Chen
Tong Xu
Jun Xie
Enhong Chen
156
18
0
23 May 2022
Learning Rate Curriculum
Florinel-Alin Croitoru
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
N. Sebe
78
9
0
18 May 2022
Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator
Guangzhi Sun
Chuxu Zhang
P. Woodland
100
14
0
18 May 2022
Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
Qianqian Dong
Fengpeng Yue
Tom Ko
Mingxuan Wang
Qibing Bai
Yu Zhang
93
16
0
18 May 2022
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
Sameer Khurana
Antoine Laurent
James R. Glass
65
37
0
17 May 2022
Who Are We Talking About? Handling Person Names in Speech Translation
Marco Gaido
Matteo Negri
Marco Turchi
80
8
0
13 May 2022
Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition
Zengrui Jin
Mengzhe Geng
Jiajun Deng
Tianzi Wang
Shujie Hu
Guinan Li
Xunying Liu
86
22
0
13 May 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
84
16
0
11 May 2022
A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR Accuracy
S. Panchapagesan
A. Narayanan
T. Shabestary
Shuai Shao
N. Howard
Alex Park
James Walker
A. Gruenstein
56
5
0
06 May 2022
Previous
1
2
3
...
7
8
9
...
19
20
21
Next