v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 1,049 papers shown

Title
Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition Yuan Gong Jingbo Yu James R. Glass 89 42 0 06 May 2022
ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks Marcely Zanon Boito John E. Ortega Hugo Riguidel Antoine Laurent Loïc Barrault ... Firas Chaabani H. Nguyen Florentin Barbier Souhir Gahbiche Yannick Esteve 64 16 0 04 May 2022
On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training Jisi Zhang Catalin Zorila R. Doddipatla Jon Barker 61 4 0 03 May 2022
Efficient dynamic filter for robust and low computational feature extraction Donghyeon Kim Gwantae Kim Bokyeung Lee Jeong-gi Kwak D. Han Hanseok Ko 60 3 0 03 May 2022
Pseudo strong labels for large scale weakly supervised audio tagging Heinrich Dinkel Zhiyong Yan Yongqing Wang Junbo Zhang Yujun Wang 63 6 0 28 Apr 2022
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations Dan Oneaţă H. Cucu 51 19 0 27 Apr 2022
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition? Sanyuan Chen Yu Wu Chengyi Wang Shujie Liu Zhuo Chen ... Gang Liu Jinyu Li Jian Wu Xiangzhan Yu Furu Wei SSL 102 42 0 27 Apr 2022
Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization Natsuo Yamashita Shota Horiguchi Takeshi Homma 74 18 0 24 Apr 2022
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR Wenjie Huang Shuo-yiin Chang David Rybach Rohit Prabhavalkar Tara N. Sainath Cyril Allauzen Cal Peyser Zhiyun Lu VLM 93 24 0 22 Apr 2022
The 2021 NIST Speaker Recognition Evaluation S. O. Sadjadi Craig S. Greenberg E. Singer Lisa P. Mason D. A. Reynolds 94 74 0 21 Apr 2022
The NIST CTS Speaker Recognition Challenge S. O. Sadjadi Craig S. Greenberg E. Singer Lisa P. Mason D. Reynolds ELM 133 0 0 21 Apr 2022
Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition Xun Gong Y. Qian Houjun Huang Yanmin Qian 81 46 0 21 Apr 2022
Detecting Unintended Memorization in Language-Model-Fused ASR Wenjie Huang Steve Chien Om Thakkar Rajiv Mathews 87 11 0 20 Apr 2022
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation Keqi Deng Shinji Watanabe Jiatong Shi Siddhant Arora 75 15 0 19 Apr 2022
Audio Deep Fake Detection System with Neural Stitching for ADD 2022 Rui Yan Cheng Wen Shuran Zhou Tingwei Guo Wei Zou Xiangang Li 49 24 0 19 Apr 2022
Caption Feature Space Regularization for Audio Captioning Yiming Zhang Hong Yu Ruoyi Du Zhanyu Ma Yuan Dong 122 1 0 18 Apr 2022
Small Footprint Multi-channel ConvMixer for Keyword Spotting with Centroid Based Awareness Dianwen Ng Jing Pang Yanghua Xiao Biao Tian Qiang Fu Eng Siong Chng 64 2 0 11 Apr 2022
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems Vishal Sunder Eric Fosler-Lussier Samuel Thomas H. Kuo Brian Kingsbury 78 7 0 11 Apr 2022
Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding Vishal Sunder Samuel Thomas H. Kuo Jatin Ganhotra Brian Kingsbury Eric Fosler-Lussier VLM 96 10 0 11 Apr 2022
Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition Zehai Tu Jack Deadman Ning Ma Jon Barker 66 4 0 08 Apr 2022
Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning Salah Zaiem Titouan Parcollet S. Essid SSL 41 6 0 08 Apr 2022
Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA? Qiongqiong Wang Kong Aik Lee Tianchi Liu 67 16 0 08 Apr 2022
GigaST: A 10,000-hour Pseudo Speech Translation Corpus Rong Ye Chengqi Zhao Tom Ko Chutong Meng Tao Wang Mingxuan Wang Jun Cao 89 23 0 08 Apr 2022
Transducer-based language embedding for spoken language identification Peng Shen Xugang Lu Hisashi Kawai 84 6 0 08 Apr 2022
Frequency Selective Augmentation for Video Representation Learning Jinhyung Kim Taeoh Kim Minho Shim Dongyoon Han Dongyoon Wee Junmo Kim AI4TS 101 4 0 08 Apr 2022
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition Shaojin Ding R. Rikhye Qiao Liang Yanzhang He Quan Wang A. Narayanan Tom O'Malley Ian McGraw 92 28 0 08 Apr 2022
Detecting Vocal Fatigue with Neural Embeddings Sebastian P. Bayerl Dominik Wagner Ilja Baumann Korbinian Riedhammer Tobias Bocklet 64 11 0 07 Apr 2022
MAESTRO: Matched Speech Text Representations through Modality Matching Zhehuai Chen Yu Zhang Andrew Rosenberg Bhuvana Ramabhadran Pedro J. Moreno Ankur Bapna Heiga Zen 98 108 0 07 Apr 2022
DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores Wei-Cheng Tseng Wei-Tsung Kao Hung-yi Lee 80 21 0 07 Apr 2022
A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition Rishabh Jain Andrei Barcovschi Mariam Yiwere Dan Bigioi Peter Corcoran H. Cucu 54 35 0 06 Apr 2022
Successes and critical failures of neural networks in capturing human-like speech recognition Federico Adolfi J. Bowers David Poeppel UQCV 86 22 0 06 Apr 2022
Towards End-to-end Unsupervised Speech Recognition Alexander H. Liu Wei-Ning Hsu Michael Auli Alexei Baevski SSL 83 74 0 05 Apr 2022
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation Dan Berrebbi Jiatong Shi Brian Yan Osbel López-Francisco Jonathan D. Amith Shinji Watanabe 68 27 0 05 Apr 2022
A Novel Capsule Neural Network Based Model for Drowsiness Detection Using Electroencephalography Signals Luis Guarda Juan Tapia E. Droguett M. Ramos 33 27 0 04 Apr 2022
An Analysis of Semantically-Aligned Speech-Text Embeddings M. Huzaifah Ivan Kukanov 90 8 0 04 Apr 2022
Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition Guodong Ma Pengfei Hu Jian Kang Shen Huang Hao-Ming Huang 78 9 0 02 Apr 2022
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation Xuankai Chang Takashi Maekaku Yuya Fujita Shinji Watanabe VLM 111 46 0 01 Apr 2022
Text-To-Speech Data Augmentation for Low Resource Speech Recognition Rodolfo Zevallos 50 4 0 01 Apr 2022
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems Takuma Udagawa Masayuki Suzuki Gakuto Kurata N. Itoh G. Saon 115 24 0 01 Apr 2022
Improved Relation Networks for End-to-End Speaker Verification and Identification Ashutosh Chaubey Sparsh Sinha Susmita Ghose 58 3 0 31 Mar 2022
Memory-Efficient Training of RNN-Transducer with Sampled Softmax Jaesong Lee Lukas Lee Shinji Watanabe 105 8 0 31 Mar 2022
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset Zehui Yang Yifan Chen Lei Luo Runyan Yang Lingxuan Ye ... Yaohui Jin Qingqing Zhang Pengyuan Zhang Lei Xie Yonghong Yan 69 51 0 31 Mar 2022
How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications Juan Pablo Zuluaga Amrutha Prasad Iuliia Nigmatulina Seyyed Saeed Sarfjoo P. Motlícek Matthias Kleinert H. Helmke Oliver Ohneiser Qingran Zhan 78 44 0 31 Mar 2022
CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR Keyu An Huahuan Zheng Zhijian Ou Hongyu Xiang Ke Ding Guanglu Wan AI4TS 52 19 0 31 Mar 2022
Streaming parallel transducer beam search with fast-slow cascaded encoders Jay Mahadeokar Yangyang Shi Ke Li Duc Le Jiedan Zhu Vikas Chandra Ozlem Kalinli M. Seltzer 75 16 0 29 Mar 2022
Integrating Lattice-Free MMI into End-to-End Speech Recognition Jinchuan Tian Jianwei Yu Chao Weng Yuexian Zou Dong Yu 106 8 0 29 Mar 2022
Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition With Emformer J. Sun Guiping Zhong Dinghao Zhou Baoxiang Li 108 0 0 29 Mar 2022
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit Binbin Zhang Di Wu Zhendong Peng Xingcheng Song Zhuoyuan Yao Hang Lv Linfu Xie Chao Yang Fuping Pan Jianwei Niu VLM 104 99 0 29 Mar 2022
Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain Data Chen Chen Nana Hou Yuchen Hu Shashank Shirol Chng Eng Siong NoLa 103 43 0 29 Mar 2022
Filler Word Detection and Classification: A Dataset and Benchmark Ge Zhu Juan-Pablo Caceres Justin Salamon 39 9 0 28 Mar 2022