v1v2v3 (latest)

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 1,048 papers shown

Title
Slow-Fast Auditory Streams For Audio Recognition Evangelos Kazakos Arsha Nagrani Andrew Zisserman Dima Damen 117 68 0 05 Mar 2021
Neural model robustness for skill routing in large-scale conversational AI systems: A design choice exploration Han Li Sunghyun Park Aswarth Abhilash Dara Jinseok Nam Sungjin Lee Young-Bum Kim Spyros Matsoukas R. Sarikaya 67 9 0 04 Mar 2021
An Empirical Study of End-to-end Simultaneous Speech Translation Decoding Strategies H. Nguyen Yannick Esteve Laurent Besacier 60 19 0 04 Mar 2021
Perceiver: General Perception with Iterative Attention Andrew Jaegle Felix Gimeno Andrew Brock Andrew Zisserman Oriol Vinyals João Carreira VLM ViT MDE 214 1,030 0 04 Mar 2021
Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition Hirofumi Inaguma Tatsuya Kawahara 127 14 0 28 Feb 2021
The NPU System for the 2020 Personalized Voice Trigger Challenge Jingyong Hou Li Zhang Yihui Fu Qing Wang Zhanheng Yang Qijie Shao Lei Xie 61 7 0 26 Feb 2021
MixSpeech: Data Augmentation for Low-resource Automatic Speech Recognition Linghui Meng Jin Xu Xu Tan Jindong Wang Tao Qin Bo Xu VLM 115 78 0 25 Feb 2021
The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods Xian Shi Fan Yu Yizhou Lu Yuhao Liang Qiangze Feng Daliang Wang Y. Qian Lei Xie 60 68 0 20 Feb 2021
End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study Prashanth Gurunath Shivakumar Shrikanth Narayanan 53 54 0 19 Feb 2021
Unit selection synthesis based data augmentation for fixed phrase speaker verification Houjun Huang Xu Xiang Fei Zhao Shuai Wang Y. Qian 16 6 0 19 Feb 2021
Fundamental Frequency Feature Normalization and Data Augmentation for Child Speech Recognition Gary Yeung Ruchao Fan Abeer Alwan 74 20 0 18 Feb 2021
End-to-end lyrics Recognition with Voice to Singing Style Transfer Sakya Basak Shrutina Agarwal Sriram Ganapathy Naoya Takahashi 68 20 0 17 Feb 2021
End-to-End Automatic Speech Recognition with Deep Mutual Learning Ryo Masumura Mana Ihori Akihiko Takashima Tomohiro Tanaka Takanori Ashihara 34 5 0 16 Feb 2021
Hierarchical Transformer-based Large-Context End-to-end ASR with Large-Context Knowledge Distillation Ryo Masumura Naoki Makishima Mana Ihori Akihiko Takashima Tomohiro Tanaka Shota Orihashi 77 29 0 16 Feb 2021
Adversarial defense for automatic speaker verification by cascaded self-supervised learning models Haibin Wu Xu Li Andy T. Liu Zhiyong Wu Helen Meng Hung-yi Lee AAML 86 41 0 14 Feb 2021
End-to-end Audio-visual Speech Recognition with Conformers Pingchuan Ma Stavros Petridis Maja Pantic 157 234 0 12 Feb 2021
Enhancing Audio Augmentation Methods with Consistency Learning Turab Iqbal Karim Helwani A. Krishnaswamy Wenwu Wang 67 5 0 09 Feb 2021
Intermediate Loss Regularization for CTC-based Speech Recognition Jaesong Lee Shinji Watanabe 151 140 0 05 Feb 2021
Data Generation Using Pass-phrase-dependent Deep Auto-encoders for Text-Dependent Speaker Verification A. K. Sarkar Md. Sahidullah Zheng-Hua Tan 26 0 0 03 Feb 2021
Speech Emotion Recognition with Multiscale Area Attention and Data Augmentation Mingke Xu Fan Zhang Xiaodong Cui Wei Zhang 54 52 0 03 Feb 2021
The Multilingual TEDx Corpus for Speech Recognition and Translation Elizabeth Salesky Sanjeev Khudanpur Jacob Bremerman R. Cattoni Matteo Negri Marco Turchi Douglas W. Oard Matt Post 79 126 0 02 Feb 2021
CTC-based Compression for Direct Speech Translation Marco Gaido Mauro Cettolo Matteo Negri Marco Turchi 104 59 0 02 Feb 2021
WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit Zhuoyuan Yao Di Wu Xiong Wang Binbin Zhang Fan Yu Chao Yang Zhendong Peng Xiaoyu Chen Lei Xie X. Lei 129 268 0 02 Feb 2021
The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap Shota Horiguchi Nelson Yalta Leibny Paola García-Perera Yuki Takashima Yawen Xue Desh Raj Zili Huang Yusuke Fujita Shinji Watanabe Sanjeev Khudanpur BDL 63 37 0 02 Feb 2021
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation Yuan Gong Yu-An Chung James R. Glass VLM 199 147 0 02 Feb 2021
BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge M. Kocour Guillermo Cámbara Jordi Luque David Bonet Mireia Farrús Martin Karafiát Karel Veselý Jan ''Honza'' Cernocký 28 6 0 29 Jan 2021
LEAF: A Learnable Frontend for Audio Classification Neil Zeghidour O. Teboul Félix de Chaumont Quitry Marco Tagliasacchi VLM AAML 137 148 0 21 Jan 2021
Arabic Speech Recognition by End-to-End, Modular Systems and Human A. Hussein Shinji Watanabe Ahmed M. Ali VLM 72 50 0 21 Jan 2021
On Data-Augmentation and Consistency-Based Semi-Supervised Learning Atin Ghosh Alexandre Hoang Thiery 132 21 0 18 Jan 2021
Tiny Transducer: A Highly-efficient Speech Recognition Model on Edge Devices Yuekai Zhang Sining Sun Long Ma 95 29 0 18 Jan 2021
An evaluation of word-level confidence estimation for end-to-end automatic speech recognition Dan Oneaţă Alexandru Caranica Adriana Stan H. Cucu UQCV 88 25 0 14 Jan 2021
End-to-End Speaker Height and age estimation using Attention Mechanism with LSTM-RNN Manav Kaushik Van Tung Pham Chng Eng Siong 53 6 0 13 Jan 2021
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection Qing Wang Jun Du Hua-Xin Wu Jia Pan Feng Ma Chin-Hui Lee 64 83 0 08 Jan 2021
Environment Transfer for Distributed Systems Chunheng Jiang Jae-wook Ahn N. Desai 60 1 0 06 Jan 2021
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks Hieu H. Pham Quoc V. Le 128 57 0 05 Jan 2021
Robustness Testing of Language Understanding in Task-Oriented Dialog Jiexi Liu Ryuichi Takanobu Jiaxin Wen Dazhen Wan Hongguang Li Weiran Nie Cheng Li Wei Peng Minlie Huang ELM 122 49 0 30 Dec 2020
NeurST: Neural Speech Translation Toolkit Chengqi Zhao Mingxuan Wang Qianqian Dong Rong Ye Lei Li 89 32 0 18 Dec 2020
CIF-based Collaborative Decoding for End-to-end Contextual Speech Recognition Minglun Han Linhao Dong Shiyu Zhou Bo Xu 73 23 0 17 Dec 2020
A review of on-device fully neural end-to-end automatic speech recognition algorithms Chanwoo Kim Dhananjaya N. Gowda Dongsoo Lee Jiyeon Kim Ankur Kumar Sungsoo Kim Abhinav Garg C. Han 68 27 0 14 Dec 2020
Bayesian Learning for Deep Neural Network Adaptation Xurong Xie Xunying Liu Tan Lee Lan Wang BDL 112 22 0 14 Dec 2020
REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling Hu Hu Xuesong Yang Zeynab Raeesy Jinxi Guo Gokce Keskin Harish Arsikere Ariya Rastrow A. Stolcke Roland Maas 64 30 0 14 Dec 2020
Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning Wei Xia Chunlei Zhang Chao Weng Meng Yu Dong Yu SSL 64 80 0 13 Dec 2020
Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging Rohit Prabhavalkar Yanzhang He David Rybach S. Campbell A. Narayanan Trevor Strohman Tara N. Sainath 125 35 0 12 Dec 2020
Improved Robustness to Disfluencies in RNN-Transducer Based Speech Recognition Valentin Mendelev Tina Raissi Guglielmo Camporese Manuel Giollo 48 21 0 11 Dec 2020
Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition Binbin Zhang Di Wu Zhuoyuan Yao Xiong Wang F. Yu Chao Yang Liyong Guo Yaguang Hu Lei Xie X. Lei 93 81 0 10 Dec 2020
Parameter Efficient Multimodal Transformers for Video Representation Learning Sangho Lee Youngjae Yu Gunhee Kim Thomas Breuel Jan Kautz Yale Song ViT 104 78 0 08 Dec 2020
Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems Xinwei Li Yuanyuan Zhang Xiaodan Zhuang Daben Liu 28 6 0 07 Dec 2020
MLS: A Large-Scale Multilingual Dataset for Speech Research Vineel Pratap Qiantong Xu Anuroop Sriram Gabriel Synnaeve R. Collobert AuLLM 189 513 0 07 Dec 2020
Triplet Entropy Loss: Improving The Generalisation of Short Speech Language Identification Systems Ruan van der Merwe 71 8 0 03 Dec 2020
Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion Vijay Ravi Yile Gu Ankur Gandhe Ariya Rastrow Linda Liu Denis Filimonov Scott Novotney I. Bulyko 60 9 0 30 Nov 2020