Deep Speech: Scaling up end-to-end speech recognition

17 December 2014

Papers citing "Deep Speech: Scaling up end-to-end speech recognition"

50 / 750 papers shown

Title
MSDT: Masked Language Model Scoring Defense in Text Domain Jaechul Roh Minhao Cheng Yajun Fang AAML 15 1 0 10 Nov 2022
Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First Regularization Zhengkun Tian Hongyu Xiang Min Li Fei Lin Ke Ding Guanglu Wan 13 6 0 07 Nov 2022
Data-free Defense of Black Box Models Against Adversarial Attacks Gaurav Kumar Nayak Inder Khatri Ruchit Rawal Anirban Chakraborty AAML 25 1 0 03 Nov 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing Yonggan Fu Yang Zhang Kaizhi Qian Zhifan Ye Zhongzhi Yu Cheng-I Jeff Lai Yingyan Lin 30 8 0 02 Nov 2022
Modular Hybrid Autoregressive Transducer Zhong Meng Tongzhou Chen Rohit Prabhavalkar Yu Zhang Gary Wang ... Bhuvana Ramabhadran Yifan Jiang Ehsan Variani Yinghui Huang Pedro J. Moreno 34 20 0 31 Oct 2022
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model Yosuke Higuchi Brian Yan Siddhant Arora Tetsuji Ogawa Tetsunori Kobayashi Shinji Watanabe 54 25 0 29 Oct 2022
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition Sanchit Gandhi Patrick von Platen Alexander M. Rush 30 24 0 24 Oct 2022
10 hours data is all you need Zeping Min Qian Ge Zhong Li 18 2 0 24 Oct 2022
Improving Semi-supervised End-to-end Automatic Speech Recognition using CycleGAN and Inter-domain Losses C. Li Ngoc Thang Vu 21 2 0 20 Oct 2022
Accelerating Transfer Learning with Near-Data Computation on Cloud Object Stores Arsany Guirguis Diana Petrescu Florin Dinu D. Quoc Javier Picorel R. Guerraoui 40 0 0 16 Oct 2022
Deep learning model compression using network sensitivity and gradients M. Sakthi N. Yadla Raj Pawate 21 2 0 11 Oct 2022
JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT Mayumi Ohta Julia Kreutzer Stefan Riezler 19 0 0 05 Oct 2022
E-Branchformer: Branchformer with Enhanced merging for speech recognition Kwangyoun Kim Felix Wu Yifan Peng Jing Pan Prashant Sridhar Kyu Jeong Han Shinji Watanabe 61 105 0 30 Sep 2022
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech Saeed Ghorbani Ylva Ferstl Daniel Holden N. Troje M. Carbonneau 30 79 0 15 Sep 2022
Deep Speech Synthesis from Articulatory Representations Peter Wu Shinji Watanabe L. Goldstein A. Black Gopala K. Anumanchipalli 39 24 0 13 Sep 2022
Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement S. Ravichandran Ondrej Texler Dimitar Dinev Hyun Jae Kang 17 4 0 03 Sep 2022
Universal Fourier Attack for Time Series Elizabeth Coda B. Clymer Chance N. DeSmet Y. Watkins Michael Girard 28 1 0 02 Sep 2022
RL-DistPrivacy: Privacy-Aware Distributed Deep Inference for low latency IoT systems Emna Baccour A. Erbad Amr M. Mohamed Mounir Hamdi M. Guizani 30 12 0 27 Aug 2022
Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems Prasoon Sinha Akhil Guliani Rutwik Jain Brandon Tran Matthew D. Sinclair Shivaram Venkataraman 19 17 0 23 Aug 2022
How does the degree of novelty impacts semi-supervised representation learning for novel class retrieval? Q. Leroy Olivier Buisson Alexis Joly SSL 21 0 0 17 Aug 2022
Unifying Gradients to Improve Real-world Robustness for Deep Networks Yingwen Wu Sizhe Chen Kun Fang X. Huang AAML 32 3 0 12 Aug 2022
Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training Jie You Jaehoon Chung Mosharaf Chowdhury 26 75 0 12 Aug 2022
Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition Peng Shen Xugang Lu Hisashi Kawai 19 2 0 29 Jul 2022
Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis Shuai Shen Wanhua Li Zhengbiao Zhu Yueqi Duan Jie Zhou Jiwen Lu CVBM 25 105 0 24 Jul 2022
Improving spatial cues for hearables using a parameterized binaural CDR estimator Reza Ghanavi C. Jin 16 1 0 17 Jul 2022
End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting Thierry Desot François Portet Michel Vacher 27 12 0 17 Jul 2022
pMCT: Patched Multi-Condition Training for Robust Speech Recognition Pablo Peso Parada A. Dobrowolska Karthikeyan P. Saravanan Mete Ozay 40 6 0 11 Jul 2022
Adversarial Ensemble Training by Jointly Learning Label Dependencies and Member Models Lele Wang B. Liu UQCV 23 4 0 29 Jun 2022
The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition Jonathan Mukiibi Andrew Katumba J. Nakatumba‐Nabende Ali Hussein Josh Meyer 22 7 0 20 Jun 2022
Residual Language Model for End-to-end Speech Recognition E. Tsunoo Yosuke Kashiwagi Chaitanya Narisetty Shinji Watanabe 19 11 0 15 Jun 2022
Local Identifiability of Deep ReLU Neural Networks: the Theory Joachim Bona-Pellissier Franccois Malgouyres F. Bachoc FAtt 67 6 0 15 Jun 2022
AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker Recognition Systems Guangke Chen Zhe Zhao Fu Song Sen Chen Lingling Fan Yang Liu AAML 32 18 0 07 Jun 2022
LegoNN: Building Modular Encoder-Decoder Models Siddharth Dalmia Dmytro Okhonko M. Lewis Sergey Edunov Shinji Watanabe Florian Metze Luke Zettlemoyer Abdel-rahman Mohamed AuLLM MoE 29 14 0 07 Jun 2022
Speech Augmentation Based Unsupervised Learning for Keyword Spotting Jian Luo Jianzong Wang Ning Cheng Haobin Tang Jing Xiao SSL 23 2 0 28 May 2022
Improving CTC-based ASR Models with Gated Interlayer Collaboration Yuting Yang Yuke Li Binbin Du 31 11 0 25 May 2022
Deep Learning for Visual Speech Analysis: A Survey Changchong Sheng Gangyao Kuang L. Bai Chen Hou Y. Guo Xin Xu M. Pietikäinen Li Liu VLM 26 33 0 22 May 2022
Cardinality-Minimal Explanations for Monotonic Neural Networks Ouns El Harzli Bernardo Cuenca Grau Ian Horrocks FAtt 38 5 0 19 May 2022
Emotion-Controllable Generalized Talking Face Generation Sanjana Sinha S. Biswas Ravindra Yadav Brojeshwar Bhowmick CVBM 15 49 0 02 May 2022
A Novel Speech-Driven Lip-Sync Model with CNN and LSTM Xiaohong Li Xiang Wang Kai Wang Shiguo Lian 16 4 0 02 May 2022
Extricating IoT Devices from Vendor Infrastructure with Karl Gina Yuan David Mazières Matei A. Zaharia 18 5 0 28 Apr 2022
Improving Self-Supervised Learning-based MOS Prediction Networks Bálint Gyires-Tóth Csaba Zainkó SSL 14 1 0 23 Apr 2022
Adversarial Scratches: Deployable Attacks to CNN Classifiers Loris Giulivi Malhar Jere Loris Rossi F. Koushanfar Gabriela F. Cretu-Ciocarlie Briland Hitaj Giacomo Boracchi AAML 20 18 0 20 Apr 2022
STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation Saad Naeem Omer Beg 6 0 0 16 Apr 2022
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes Shaojin Ding Weiran Wang Ding Zhao Tara N. Sainath Yanzhang He ... Qiao Liang Dongseong Hwang Ian McGraw Rohit Prabhavalkar Trevor Strohman 30 17 0 13 Apr 2022
A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition Rishabh Jain Andrei Barcovschi Mariam Yiwere Dan Bigioi Peter Corcoran H. Cucu 22 31 0 06 Apr 2022
Successes and critical failures of neural networks in capturing human-like speech recognition Federico Adolfi J. Bowers David Poeppel UQCV 22 19 0 06 Apr 2022
Lip to Speech Synthesis with Visual Context Attentional GAN Minsu Kim Joanna Hong Y. Ro 23 50 0 04 Apr 2022
Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents P. Dubey B. Shah 6 13 0 03 Apr 2022
Multi-task RNN-T with Semantic Decoder for Streamable Spoken Language Understanding Xuandi Fu Feng-Ju Chang Martin H. Radfar Kailin Wei Jing Liu Grant P. Strimel Kanthashree Mysore Sathyendra 16 4 0 01 Apr 2022
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation Xuankai Chang Takashi Maekaku Yuya Fujita Shinji Watanabe VLM 51 45 0 01 Apr 2022