v1v2 (latest)

Listen, Attend and Spell

5 August 2015

Papers citing "Listen, Attend and Spell"

50 / 1,041 papers shown

Title
Revealing and Protecting Labels in Distributed Training Trung D. Q. Dang Om Thakkar Swaroop Indra Ramaswamy Rajiv Mathews Peter Chin Franccoise Beaufays 38 26 0 31 Oct 2021
Pseudo-Labeling for Massively Multilingual Speech Recognition Loren Lugosch Tatiana Likhomanenko Gabriel Synnaeve R. Collobert VLM 83 30 0 30 Oct 2021
Cross-attention conformer for context modeling in speech enhancement for ASR A. Narayanan Chung-Cheng Chiu Tom O'Malley Quan Wang Yanzhang He 73 14 0 30 Oct 2021
An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR Huaibo Zhao Yosuke Higuchi Tetsuji Ogawa Tetsunori Kobayashi 34 4 0 20 Oct 2021
Automatic Learning of Subword Dependent Model Scales Felix Meyer Wilfried Michel Mohammad Zeineldeen Ralf Schluter Hermann Ney 37 0 0 18 Oct 2021
Sub-word Level Lip Reading With Visual Attention Prajwal K R Triantafyllos Afouras Andrew Zisserman 91 93 0 14 Oct 2021
On Language Model Integration for RNN Transducer based Speech Recognition Wei Zhou Zuoyun Zheng Ralf Schluter Hermann Ney 105 23 0 13 Oct 2021
Reason induced visual attention for explainable autonomous driving Sikai Chen Jiqian Dong Runjia Du Yujie Li Samuel Labi 68 1 0 11 Oct 2021
A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation Yosuke Higuchi Nanxin Chen Yuya Fujita Hirofumi Inaguma Tatsuya Komatsu Jaesong Lee Jumon Nozaki Tianzi Wang Shinji Watanabe 49 43 0 11 Oct 2021
K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables Jounghee Kim Pilsung Kang VLM 48 6 0 11 Oct 2021
Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy Yosuke Higuchi Niko Moritz Jonathan Le Roux Takaaki Hori 83 12 0 11 Oct 2021
Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition Guoli Ye V. Mazalov Jinyu Li Jiawei Liu 70 9 0 10 Oct 2021
SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition Li Fu Xiaoxiao Li Runyu Wang Lu Fan Zhengchen Zhang Meng Chen Youzheng Wu Xiaodong He SSL 45 3 0 08 Oct 2021
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units Yosuke Higuchi Keita Karube Tetsuji Ogawa Tetsunori Kobayashi 58 24 0 08 Oct 2021
Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask Shaoshi Ling Chen Shen Meng Cai Zejun Ma VLM SSL 89 10 0 08 Oct 2021
Explaining the Attention Mechanism of End-to-End Speech Recognition Using Decision Trees Yuanchao Wang Wenjing Du Chenghao Cai Yanyan Xu 66 1 0 08 Oct 2021
WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition Binbin Zhang Hang Lv Pengcheng Guo Qijie Shao Chao Yang ... Hui Bu Xiaoyu Chen Chenchen Zeng Di Wu Zhendong Peng 153 231 0 07 Oct 2021
BERT Attends the Conversation: Improving Low-Resource Conversational ASR Pablo Ortiz Simen Burud 64 5 0 05 Oct 2021
ASR Rescoring and Confidence Estimation with ELECTRA Hayato Futami Hirofumi Inaguma Masato Mimura S. Sakai Tatsuya Kawahara KELM 104 21 0 05 Oct 2021
Multi-axis Attentive Prediction for Sparse EventData: An Application to Crime Prediction Yi Sui Ga Wu Scott Sanner 50 2 0 05 Oct 2021
Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition Tsendsuren Munkhdalai K. Sim Angad Chandorkar Fan Gao Mason Chua Trevor Strohman F. Beaufays 75 34 0 05 Oct 2021
Towards efficient end-to-end speech recognition with biologically-inspired neural networks Thomas Bohnstingl Ayush Garg Stanislaw Wo'zniak G. Saon E. Eleftheriou A. Pantazi 56 5 0 04 Oct 2021
Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning Toshiko Shibano Xinyi Zhang Miao Li Haejin Cho Peter Sullivan Muhammad Abdul-Mageed VLM 79 18 0 01 Oct 2021
Multimodal Emotion Recognition with High-level Speech and Text Features M. R. Makiuchi Kuniaki Uto Koichi Shinoda 85 72 0 29 Sep 2021
Word-level confidence estimation for RNN transducers Mingqiu Wang H. Soltau Laurent El Shafey Izhak Shafran UQCV 74 5 0 28 Sep 2021
Private Language Model Adaptation for Speech Recognition Zhe Liu Ke Li Shreyan Bakshi Fuchun Peng 98 6 0 28 Sep 2021
Factorized Neural Transducer for Efficient Language Model Adaptation Xie Chen Zhong Meng S. Parthasarathy Jinyu Li 147 40 0 27 Sep 2021
Training Spiking Neural Networks Using Lessons From Deep Learning Jason K. Eshraghian Max Ward Emre Neftci Xinxin Wang Gregor Lenz Girish Dwivedi Bennamoun Doo Seok Jeong Wei D. Lu 141 469 0 27 Sep 2021
ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization M. Gaudesi F. Weninger D. Sharma P. Zhan AAML 75 1 0 23 Sep 2021
Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition Guolin Zheng Yubei Xiao Ke Gong Pan Zhou Xiaodan Liang Liang Lin 76 26 0 19 Sep 2021
Dual-Encoder Architecture with Encoder Selection for Joint Close-Talk and Far-Talk Speech Recognition F. Weninger M. Gaudesi Ralf Leibold R. Gemello P. Zhan 50 4 0 17 Sep 2021
PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription Chen Zhang Jiaxing Yu Luchin Chang Xu Tan Jiawei Chen Tao Qin Kecheng Zhang 79 15 0 16 Sep 2021
Utterance-level neural confidence measure for end-to-end children speech recognition W. Liu Tan Lee 51 5 0 16 Sep 2021
Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition Felix Wu Kwangyoun Kim Jing Pan Kyu Jeong Han Kilian Q. Weinberger Yoav Artzi 62 75 0 14 Sep 2021
Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition Rong Gong Carl Quillen D. Sharma Andrew Goderre José Laínez Ljubomir Milanović 94 14 0 10 Sep 2021
Tree-constrained Pointer Generator for End-to-end Contextual Speech Recognition Guangzhi Sun Chao Zhang P. Woodland 105 33 0 01 Sep 2021
Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation Samuel Cahyawijaya 103 12 0 24 Aug 2021
Reducing Exposure Bias in Training Recurrent Neural Network Transducers Xiaodong Cui Brian Kingsbury G. Saon David Haws Zoltán Tüske 51 5 0 24 Aug 2021
Multilingual Speech Recognition for Low-Resource Indian Languages using Multi-Task conformer Krishna D N Freshworks 35 7 0 22 Aug 2021
A Dual-Decoder Conformer for Multilingual Speech Recognition Krishna D N Freshworks 21 1 0 22 Aug 2021
Using Large Pre-Trained Models with Cross-Modal Attention for Multi-Modal Emotion Recognition Krishna D N Freshworks 61 12 0 22 Aug 2021
Generalizing RNN-Transducer to Out-Domain Audio via Sparse Self-Attention Layers Juntae Kim Jee-Hye Lee 41 6 0 22 Aug 2021
A Light-weight contextual spelling correction model for customizing transducer-based speech recognition systems Xiaoqiang Wang Yanqing Liu Sheng Zhao Jinyu Li KELM 68 16 0 17 Aug 2021
SpecMix : A Mixed Sample Data Augmentation method for Training withTime-Frequency Domain Features Gwantae Kim D. Han Hanseok Ko 101 45 0 06 Aug 2021
Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification Yiding Jiang Bidisha Sharma Maulik C. Madhavi Haizhou Li 100 26 0 05 Aug 2021
Adversarial Data Augmentation for Disordered Speech Recognition Zengrui Jin Mengzhe Geng Xurong Xie Jianwei Yu Shansong Liu Xunying Liu Helen Meng 54 37 0 02 Aug 2021
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations Seyun Um Jihyun Kim Jihyun Lee Hong-Goo Kang CVBM 148 4 0 26 Jul 2021
Ensemble of Convolution Neural Networks on Heterogeneous Signals for Sleep Stage Scoring Enrique Fernández-Blanco C. Fernandez-Lozano A. Pazos Daniel Rivero 38 4 0 23 Jul 2021
VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording Hirofumi Inaguma Tatsuya Kawahara 113 2 0 15 Jul 2021
A Configurable Multilingual Model is All You Need to Recognize All Languages Long Zhou Jinyu Li Eric Sun Shujie Liu 138 42 0 13 Jul 2021