ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1508.01211
  4. Cited By
Listen, Attend and Spell

Listen, Attend and Spell

5 August 2015
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
    RALM
ArXivPDFHTML

Papers citing "Listen, Attend and Spell"

50 / 1,033 papers shown
Title
K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of
  Graphemes and Syllables
K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables
Jounghee Kim
Pilsung Kang
VLM
29
6
0
11 Oct 2021
Advancing Momentum Pseudo-Labeling with Conformer and Initialization
  Strategy
Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy
Yosuke Higuchi
Niko Moritz
Jonathan Le Roux
Takaaki Hori
19
11
0
11 Oct 2021
Have best of both worlds: two-pass hybrid and E2E cascading framework
  for speech recognition
Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition
Guoli Ye
V. Mazalov
Jinyu Li
Jiawei Liu
25
9
0
10 Oct 2021
SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition
SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition
Li Fu
Xiaoxiao Li
Runyu Wang
Lu Fan
Zhengchen Zhang
Meng Chen
Youzheng Wu
Xiaodong He
SSL
8
3
0
08 Oct 2021
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular
  Subword Units
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units
Yosuke Higuchi
Keita Karube
Tetsuji Ogawa
Tetsunori Kobayashi
18
23
0
08 Oct 2021
Improving Pseudo-label Training For End-to-end Speech Recognition Using
  Gradient Mask
Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask
Shaoshi Ling
Chen Shen
Meng Cai
Zejun Ma
VLM
SSL
22
8
0
08 Oct 2021
Explaining the Attention Mechanism of End-to-End Speech Recognition
  Using Decision Trees
Explaining the Attention Mechanism of End-to-End Speech Recognition Using Decision Trees
Yuanchao Wang
Wenjing Du
Chenghao Cai
Yanyan Xu
34
1
0
08 Oct 2021
WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech
  Recognition
WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition
Binbin Zhang
Hang Lv
Pengcheng Guo
Qijie Shao
Chao Yang
...
Hui Bu
Xiaoyu Chen
Chenchen Zeng
Di Wu
Zhendong Peng
25
219
0
07 Oct 2021
BERT Attends the Conversation: Improving Low-Resource Conversational ASR
BERT Attends the Conversation: Improving Low-Resource Conversational ASR
Pablo Ortiz
Simen Burud
34
4
0
05 Oct 2021
ASR Rescoring and Confidence Estimation with ELECTRA
ASR Rescoring and Confidence Estimation with ELECTRA
Hayato Futami
Hirofumi Inaguma
Masato Mimura
S. Sakai
Tatsuya Kawahara
KELM
62
20
0
05 Oct 2021
Multi-axis Attentive Prediction for Sparse EventData: An Application to
  Crime Prediction
Multi-axis Attentive Prediction for Sparse EventData: An Application to Crime Prediction
Yi Sui
Ga Wu
Scott Sanner
18
2
0
05 Oct 2021
Fast Contextual Adaptation with Neural Associative Memory for On-Device
  Personalized Speech Recognition
Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition
Tsendsuren Munkhdalai
K. Sim
Angad Chandorkar
Fan Gao
Mason Chua
Trevor Strohman
F. Beaufays
32
34
0
05 Oct 2021
Towards efficient end-to-end speech recognition with
  biologically-inspired neural networks
Towards efficient end-to-end speech recognition with biologically-inspired neural networks
Thomas Bohnstingl
Ayush Garg
Stanislaw Wo'zniak
G. Saon
E. Eleftheriou
A. Pantazi
29
5
0
04 Oct 2021
Speech Technology for Everyone: Automatic Speech Recognition for
  Non-Native English with Transfer Learning
Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning
Toshiko Shibano
Xinyi Zhang
Miao Li
Haejin Cho
Peter Sullivan
Muhammad Abdul-Mageed
VLM
36
17
0
01 Oct 2021
Multimodal Emotion Recognition with High-level Speech and Text Features
Multimodal Emotion Recognition with High-level Speech and Text Features
M. R. Makiuchi
Kuniaki Uto
Koichi Shinoda
12
70
0
29 Sep 2021
Word-level confidence estimation for RNN transducers
Word-level confidence estimation for RNN transducers
Mingqiu Wang
H. Soltau
Laurent El Shafey
Izhak Shafran
UQCV
21
5
0
28 Sep 2021
Private Language Model Adaptation for Speech Recognition
Private Language Model Adaptation for Speech Recognition
Zhe Liu
Ke Li
Shreyan Bakshi
Fuchun Peng
34
6
0
28 Sep 2021
Factorized Neural Transducer for Efficient Language Model Adaptation
Factorized Neural Transducer for Efficient Language Model Adaptation
Xie Chen
Zhong Meng
S. Parthasarathy
Jinyu Li
21
39
0
27 Sep 2021
Training Spiking Neural Networks Using Lessons From Deep Learning
Training Spiking Neural Networks Using Lessons From Deep Learning
Jason Eshraghian
Max Ward
Emre Neftci
Xinxin Wang
Gregor Lenz
Girish Dwivedi
Bennamoun
Doo Seok Jeong
Wei D. Lu
40
433
0
27 Sep 2021
ChannelAugment: Improving generalization of multi-channel ASR by
  training with input channel randomization
ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization
M. Gaudesi
F. Weninger
D. Sharma
P. Zhan
AAML
33
1
0
23 Sep 2021
Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning
  for Low-Resource Speech Recognition
Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition
Guolin Zheng
Yubei Xiao
Ke Gong
Pan Zhou
Xiaodan Liang
Liang Lin
32
26
0
19 Sep 2021
Dual-Encoder Architecture with Encoder Selection for Joint Close-Talk
  and Far-Talk Speech Recognition
Dual-Encoder Architecture with Encoder Selection for Joint Close-Talk and Far-Talk Speech Recognition
F. Weninger
M. Gaudesi
Ralf Leibold
R. Gemello
P. Zhan
35
4
0
17 Sep 2021
PDAugment: Data Augmentation by Pitch and Duration Adjustments for
  Automatic Lyrics Transcription
PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription
Chen Zhang
Jiaxing Yu
Luchin Chang
Xu Tan
Jiawei Chen
Tao Qin
Kecheng Zhang
30
15
0
16 Sep 2021
Utterance-level neural confidence measure for end-to-end children speech
  recognition
Utterance-level neural confidence measure for end-to-end children speech recognition
W. Liu
Tan Lee
22
4
0
16 Sep 2021
Performance-Efficiency Trade-offs in Unsupervised Pre-training for
  Speech Recognition
Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
Felix Wu
Kwangyoun Kim
Jing Pan
Kyu Jeong Han
Kilian Q. Weinberger
Yoav Artzi
27
71
0
14 Sep 2021
Self-Attention Channel Combinator Frontend for End-to-End Multichannel
  Far-field Speech Recognition
Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition
Rong Gong
Carl Quillen
D. Sharma
Andrew Goderre
José Laínez
Ljubomir Milanović
39
13
0
10 Sep 2021
Tree-constrained Pointer Generator for End-to-end Contextual Speech
  Recognition
Tree-constrained Pointer Generator for End-to-end Contextual Speech Recognition
Guangzhi Sun
Chao Zhang
P. Woodland
27
30
0
01 Sep 2021
Greenformers: Improving Computation and Memory Efficiency in Transformer
  Models via Low-Rank Approximation
Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation
Samuel Cahyawijaya
26
12
0
24 Aug 2021
Reducing Exposure Bias in Training Recurrent Neural Network Transducers
Reducing Exposure Bias in Training Recurrent Neural Network Transducers
Xiaodong Cui
Brian Kingsbury
G. Saon
David Haws
Zoltán Tüske
19
5
0
24 Aug 2021
Multilingual Speech Recognition for Low-Resource Indian Languages using
  Multi-Task conformer
Multilingual Speech Recognition for Low-Resource Indian Languages using Multi-Task conformer
Krishna D N Freshworks
29
7
0
22 Aug 2021
A Dual-Decoder Conformer for Multilingual Speech Recognition
A Dual-Decoder Conformer for Multilingual Speech Recognition
Krishna D N Freshworks
14
1
0
22 Aug 2021
Using Large Pre-Trained Models with Cross-Modal Attention for
  Multi-Modal Emotion Recognition
Using Large Pre-Trained Models with Cross-Modal Attention for Multi-Modal Emotion Recognition
Krishna D N Freshworks
24
11
0
22 Aug 2021
Generalizing RNN-Transducer to Out-Domain Audio via Sparse
  Self-Attention Layers
Generalizing RNN-Transducer to Out-Domain Audio via Sparse Self-Attention Layers
Juntae Kim
Jee-Hye Lee
32
6
0
22 Aug 2021
A Light-weight contextual spelling correction model for customizing
  transducer-based speech recognition systems
A Light-weight contextual spelling correction model for customizing transducer-based speech recognition systems
Xiaoqiang Wang
Yanqing Liu
Sheng Zhao
Jinyu Li
KELM
21
15
0
17 Aug 2021
SpecMix : A Mixed Sample Data Augmentation method for Training
  withTime-Frequency Domain Features
SpecMix : A Mixed Sample Data Augmentation method for Training withTime-Frequency Domain Features
Gwantae Kim
D. Han
Hanseok Ko
47
42
0
06 Aug 2021
Knowledge Distillation from BERT Transformer to Speech Transformer for
  Intent Classification
Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification
Yiding Jiang
Bidisha Sharma
Maulik C. Madhavi
Haizhou Li
36
25
0
05 Aug 2021
Adversarial Data Augmentation for Disordered Speech Recognition
Adversarial Data Augmentation for Disordered Speech Recognition
Zengrui Jin
Mengzhe Geng
Xurong Xie
Jianwei Yu
Shansong Liu
Xunying Liu
Helen Meng
14
35
0
02 Aug 2021
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal
  Latent Representations
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations
Seyun Um
Jihyun Kim
Jihyun Lee
Hong-Goo Kang
CVBM
13
4
0
26 Jul 2021
Ensemble of Convolution Neural Networks on Heterogeneous Signals for
  Sleep Stage Scoring
Ensemble of Convolution Neural Networks on Heterogeneous Signals for Sleep Stage Scoring
Enrique Fernández-Blanco
C. Fernandez-Lozano
A. Pazos
Daniel Rivero
13
4
0
23 Jul 2021
VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording
VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording
Hirofumi Inaguma
Tatsuya Kawahara
19
2
0
15 Jul 2021
A Configurable Multilingual Model is All You Need to Recognize All
  Languages
A Configurable Multilingual Model is All You Need to Recognize All Languages
Long Zhou
Jinyu Li
Eric Sun
Shujie Liu
100
40
0
13 Jul 2021
ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for
  Low-Resource Real-World Data
ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for Low-Resource Real-World Data
K. Cheuk
Dorien Herremans
Li Su
58
32
0
11 Jul 2021
On lattice-free boosted MMI training of HMM and CTC-based full-context
  ASR models
On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models
Xiaohui Zhang
Vimal Manohar
David C. Zhang
Frank Zhang
Yangyang Shi
Nayan Singhal
Julian Chan
Fuchun Peng
Yatharth Saraf
M. Seltzer
20
14
0
09 Jul 2021
End-to-End Rich Transcription-Style Automatic Speech Recognition with
  Semi-Supervised Learning
End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
Tomohiro Tanaka
Ryo Masumura
Mana Ihori
Akihiko Takashima
Shota Orihashi
Naoki Makishima
19
4
0
07 Jul 2021
Instant One-Shot Word-Learning for Context-Specific Neural
  Sequence-to-Sequence Speech Recognition
Instant One-Shot Word-Learning for Context-Specific Neural Sequence-to-Sequence Speech Recognition
Christian Huber
Juan Hussain
Sebastian Stüker
A. Waibel
26
24
0
05 Jul 2021
Relaxed Attention: A Simple Method to Boost Performance of End-to-End
  Automatic Speech Recognition
Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition
Timo Lohrenz
P. Schwarz
Zhengyang Li
Tim Fingscheidt
29
11
0
02 Jul 2021
What do End-to-End Speech Models Learn about Speaker, Language and
  Channel Information? A Layer-wise and Neuron-level Analysis
What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis
Shammur A. Chowdhury
Nadir Durrani
Ahmed M. Ali
41
12
0
01 Jul 2021
On joint training with interfaces for spoken language understanding
On joint training with interfaces for spoken language understanding
A. Raju
Milind Rao
Gautam Tiwari
Pranav Dheram
Bryan Anderson
Zhe Zhang
Chul Lee
Bach Bui
Ariya Rastrow
VLM
21
11
0
30 Jun 2021
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
18
352
0
29 Jun 2021
Where are we in semantic concept extraction for Spoken Language
  Understanding?
Where are we in semantic concept extraction for Spoken Language Understanding?
Sahar Ghannay
Antoine Caubrière
Salima Mdhaffar
G. Laperriere
Bassam Jabaian
Yannick Esteve
12
18
0
24 Jun 2021
Previous
123...8910...192021
Next