Wav2Letter: an End-to-End ConvNet-based Speech Recognition System

11 September 2016

Papers citing "Wav2Letter: an End-to-End ConvNet-based Speech Recognition System"

48 / 48 papers shown

Title
Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality Tina Raissi Christoph Luscher Simon Berger Ralf Schluter Hermann Ney 40 2 0 16 Jul 2024
Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework Eliya Segev Maya Alroy Ronen Katsir Noam Wies Ayana Shenhav ... D. Zar Oren Tadmor Jacob Bitterman Amnon Shashua Tal Rosenwein 32 2 0 04 Jul 2023
Testing predictions of representation cost theory with CNNs Charles Godfrey Elise Bishoff Myles Mckay Davis Brown Grayson Jorgenson Henry Kvinge E. Byler 24 0 0 03 Oct 2022
And what if two musical versions don't share melody, harmony, rhythm, or lyrics ? M. Abrassart Guillaume Doras 31 3 0 03 Oct 2022
Low-Level Physiological Implications of End-to-End Learning of Speech Recognition Louise Coppieters de Gibson Philip N. Garner 21 1 0 22 Aug 2022
Global Normalization for Streaming Speech Recognition in a Modular Framework Ehsan Variani Ke Wu Michael Riley David Rybach Matt Shannon Cyril Allauzen 20 9 0 26 May 2022
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages Felix Wu Kwangyoun Kim Shinji Watanabe Kyu Jeong Han Ryan T. McDonald Kilian Q. Weinberger Yoav Artzi SyDa 48 37 0 02 May 2022
Romanian Speech Recognition Experiments from the ROBIN Project Andrei-Marius Avram Vasile Puaics Dan Tufics 13 4 0 23 Nov 2021
TorchAudio: Building Blocks for Audio and Speech Processing Yao-Yuan Yang Moto Hira Zhaoheng Ni Anjali Chourdia Artyom Astafurov ... Sean Narenthiran Shinji Watanabe Soumith Chintala Vincent Quenneville-Bélair Yangyang Shi 31 165 0 28 Oct 2021
Unsupervised Automatic Speech Recognition: A Review Hanan Aldarmaki Asad Ullah Nazar Zaki VLM SSL 39 57 0 09 Jun 2021
An Improved Model for Voicing Silent Speech David Gaddy Dana Klein 26 30 0 03 Jun 2021
SoK: A Modularized Approach to Study the Security of Automatic Speech Recognition Systems Yuxuan Chen Jiangshan Zhang Xuejing Yuan Shengzhi Zhang Kai Chen Xiaofeng Wang Shanqing Guo AAML 37 15 0 19 Mar 2021
Inductive biases, pretraining and fine-tuning jointly account for brain responses to speech Juliette Millet J. King 53 34 0 25 Feb 2021
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation Changhan Wang M. Rivière Ann Lee Anne Wu Chaitanya Talnikar Daniel Haziza Mary Williamson J. Pino Emmanuel Dupoux SSL 25 460 0 02 Jan 2021
Recognizing More Emotions with Less Data Using Self-supervised Transfer Learning Jonathan Boigne Biman Liyanage Ted Östrem 21 20 0 11 Nov 2020
Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment Ethan A. Chi Julian Salazar Katrin Kirchhoff AI4TS 22 51 0 24 Oct 2020
Rethinking Evaluation in ASR: Are Our Models Robust Enough? Tatiana Likhomanenko Qiantong Xu Vineel Pratap Paden Tomasello Jacob Kahn Gilad Avidov R. Collobert Gabriel Synnaeve 39 98 0 22 Oct 2020
Differentiable Weighted Finite-State Transducers Awni Y. Hannun Vineel Pratap Jacob Kahn Wei-Ning Hsu 25 29 0 02 Oct 2020
Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions Stephen Roller Y-Lan Boureau Jason Weston Antoine Bordes Emily Dinan ... Kurt Shuster Eric Michael Smith Arthur Szlam Jack Urbanek Mary Williamson LLMAG AI4CE 28 51 0 22 Jun 2020
MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech Jakob Drachmann Havtorn Jan Latko Joakim Edin Lasse Borgholt Lars Maaløe Lorenzo Belgrano Nicolai Frost Jakobsen R. Sdun Zeljko Agic 19 3 0 02 May 2020
Hybrid Autoregressive Transducer (hat) Ehsan Variani David Rybach Cyril Allauzen Michael Riley 21 158 0 12 Mar 2020
Imputer: Sequence Modelling via Imputation and Dynamic Programming William Chan Chitwan Saharia Geoffrey E. Hinton Mohammad Norouzi Navdeep Jaitly BDL AI4TS 21 114 0 20 Feb 2020
Gradient-Adjusted Neuron Activation Profiles for Comprehensive Introspection of Convolutional Speech Recognition Models A. Krug Sebastian Stober 24 0 0 19 Feb 2020
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures Gabriel Synnaeve Qiantong Xu Jacob Kahn Tatiana Likhomanenko Edouard Grave Vineel Pratap Anuroop Sriram Vitaliy Liptchinsky R. Collobert SSL AI4TS 36 246 0 19 Nov 2019
Effectiveness of self-supervised pre-training for speech recognition Alexei Baevski Michael Auli Abdel-rahman Mohamed SSL 27 147 0 10 Nov 2019
Meta Learning for End-to-End Low-Resource Speech Recognition Jui-Yang Hsu Yuan-Jui Chen Hung-yi Lee 27 103 0 26 Oct 2019
Transformer-based Acoustic Modeling for Hybrid Speech Recognition Yongqiang Wang Abdel-rahman Mohamed Duc Le Chunxi Liu Alex Xiao ... Xiaohui Zhang Frank Zhang Christian Fuegen Geoffrey Zweig M. Seltzer 16 248 0 22 Oct 2019
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations Alexei Baevski Steffen Schneider Michael Auli SSL 11 660 0 12 Oct 2019
Self-Training for End-to-End Speech Recognition Jacob Kahn Ann Lee Awni Y. Hannun SSL 27 231 0 19 Sep 2019
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit Yiming Wang Tongfei Chen Hainan Xu Shuoyang Ding Hang Lv Yiwen Shao Nanyun Peng Lei Xie Shinji Watanabe Sanjeev Khudanpur VLM 27 73 0 18 Sep 2019
Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation Jiahao Lin Gim Hee Lee 33 74 0 22 Aug 2019
Word-level Speech Recognition with a Letter to Word Encoder R. Collobert Awni Y. Hannun Gabriel Synnaeve 3DV 14 4 0 10 Jun 2019
TTS Skins: Speaker Conversion via ASR Adam Polyak Lior Wolf Yaniv Taigman 18 27 0 18 Apr 2019
A Fully Differentiable Beam Search Decoder R. Collobert Awni Y. Hannun Gabriel Synnaeve 17 40 0 16 Feb 2019
Modeling Human Motion with Quaternion-based Neural Networks Dario Pavllo Christoph Feichtenhofer Michael Auli David Grangier 3DH 27 173 0 21 Jan 2019
Towards Using Context-Dependent Symbols in CTC Without State-Tying Decision Trees J. Chorowski A. Lancucki Bartosz Kostka Michal Zapotoczny 19 5 0 14 Jan 2019
wav2letter++: The Fastest Open-source Speech Recognition System Vineel Pratap Awni Y. Hannun Qiantong Xu Jeff Cai Jacob Kahn Gabriel Synnaeve Vitaliy Liptchinsky R. Collobert VLM 12 156 0 18 Dec 2018
To Reverse the Gradient or Not: An Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition Yossi Adi Neil Zeghidour R. Collobert Nicolas Usunier Vitaliy Liptchinsky Gabriel Synnaeve 29 39 0 09 Dec 2018
3D human pose estimation in video with temporal convolutions and semi-supervised training Dario Pavllo Christoph Feichtenhofer David Grangier Michael Auli 3DH 13 996 0 28 Nov 2018
Automatic Grammar Augmentation for Robust Voice Command Recognition Yang Yang Anusha Lalitha Jinwon Lee Chris Lott 21 3 0 14 Nov 2018
Deep Audio-Visual Speech Recognition Triantafyllos Afouras Joon Son Chung A. Senior Oriol Vinyals Andrew Zisserman 27 687 0 06 Sep 2018
Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units Zhangyu Xiao Zhijian Ou Wei Chu Hui-Ching Lin 38 38 0 13 Jul 2018
Exploiting Nontrivial Connectivity for Automatic Speech Recognition Marius Paraschiv Lasse Borgholt T. M. S. Tax Marco Singh Lars Maaløe 33 0 0 28 Nov 2017
Sequence Prediction with Neural Segmental Models Hao Tang 29 2 0 05 Sep 2017
Exploring Neural Transducers for End-to-End Speech Recognition Eric Battenberg Jitong Chen R. Child Adam Coates Yashesh Gaur Yi Li ... Hairong Liu S. Satheesh David Seetapun Anuroop Sriram Zhenyao Zhu AI4TS 42 229 0 24 Jul 2017
Multimodal Machine Learning: A Survey and Taxonomy T. Baltrušaitis Chaitanya Ahuja Louis-Philippe Morency 15 2,865 0 26 May 2017
Reducing Bias in Production Speech Models Eric Battenberg R. Child Adam Coates Christopher Fougner Yashesh Gaur ... Vinay Rao S. Satheesh David Seetapun Anuroop Sriram Zhenyao Zhu 38 10 0 11 May 2017
Sharp Minima Can Generalize For Deep Nets Laurent Dinh Razvan Pascanu Samy Bengio Yoshua Bengio ODL 46 757 0 15 Mar 2017