ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1508.01211
  4. Cited By
Listen, Attend and Spell

Listen, Attend and Spell

5 August 2015
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
    RALM
ArXivPDFHTML

Papers citing "Listen, Attend and Spell"

50 / 1,034 papers shown
Title
Multi-Channel Automatic Speech Recognition Using Deep Complex Unet
Multi-Channel Automatic Speech Recognition Using Deep Complex Unet
Yuxiang Kong
Jian Wu
Quandong Wang
Peng Gao
Weiji Zhuang
Yujun Wang
Lei Xie
15
8
0
18 Nov 2020
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
Xi Wang
Huaiping Ming
Lei He
Frank Soong
19
5
0
17 Nov 2020
Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin
  Speech Recognition with a Syllable-to-Character Converter
Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin Speech Recognition with a Syllable-to-Character Converter
Xiong Wang
Zhuoyuan Yao
Xian Shi
Lei Xie
27
30
0
17 Nov 2020
Deep Shallow Fusion for RNN-T Personalization
Deep Shallow Fusion for RNN-T Personalization
Duc Le
Gil Keren
Julian Chan
Jay Mahadeokar
Christian Fuegen
M. Seltzer
23
77
0
16 Nov 2020
Efficient Knowledge Distillation for RNN-Transducer Models
Efficient Knowledge Distillation for RNN-Transducer Models
S. Panchapagesan
Daniel S. Park
Chung-Cheng Chiu
Yuan Shangguan
Qiao Liang
A. Gruenstein
26
53
0
11 Nov 2020
Towards Semi-Supervised Semantics Understanding from Speech
Towards Semi-Supervised Semantics Understanding from Speech
Cheng-I Jeff Lai
Jin Cao
S. Bodapati
Shang-Wen Li
SSL
22
7
0
11 Nov 2020
A low latency ASR-free end to end spoken language understanding system
A low latency ASR-free end to end spoken language understanding system
Mohamed Mhiri
Samuel Myer
Vikrant Singh Tomar
30
8
0
10 Nov 2020
Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR
Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR
Xiaohui Zhang
Frank Zhang
Chunxi Liu
Kjell Schubert
Julian Chan
...
Jun Liu
Ching-Feng Yeh
Fuchun Peng
Yatharth Saraf
Geoffrey Zweig
23
20
0
09 Nov 2020
Listen, Look and Deliberate: Visual context-aware speech recognition
  using pre-trained text-video representations
Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations
Shahram Ghorbani
Yashesh Gaur
Yu Shi
Jinyu Li
33
14
0
08 Nov 2020
Dual Application of Speech Enhancement for Automatic Speech Recognition
Dual Application of Speech Enhancement for Automatic Speech Recognition
Ashutosh Pandey
Chunxi Liu
Yun Wang
Yatharth Saraf
46
37
0
07 Nov 2020
Improving RNN Transducer Based ASR with Auxiliary Tasks
Improving RNN Transducer Based ASR with Auxiliary Tasks
Chunxi Liu
Frank Zhang
Duc Le
Suyoun Kim
Yatharth Saraf
Geoffrey Zweig
31
49
0
05 Nov 2020
Paralinguistic Privacy Protection at the Edge
Paralinguistic Privacy Protection at the Edge
Ranya Aloufi
Hamed Haddadi
David E. Boyle
22
14
0
04 Nov 2020
Sequence-to-Sequence Learning via Attention Transfer for Incremental
  Speech Recognition
Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition
Sashi Novitasari
Andros Tjandra
S. Sakti
Satoshi Nakamura
CLL
8
12
0
04 Nov 2020
Augmenting Images for ASR and TTS through Single-loop and Dual-loop
  Multimodal Chain Framework
Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework
Johanes Effendi
Andros Tjandra
S. Sakti
Satoshi Nakamura
19
3
0
04 Nov 2020
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech
  Recognition
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition
Zhong Meng
S. Parthasarathy
Eric Sun
Yashesh Gaur
Naoyuki Kanda
Liang Lu
Xie Chen
Rui Zhao
Jinyu Li
Jiawei Liu
AuLLM
19
107
0
03 Nov 2020
Improving RNN transducer with normalized jointer network
Improving RNN transducer with normalized jointer network
Mingkun Huang
Jun Zhang
Meng Cai
Yang Zhang
Jiali Yao
Yongbin You
Yi He
Zejun Ma
25
7
0
03 Nov 2020
Dynamic latency speech recognition with asynchronous revision
Dynamic latency speech recognition with asynchronous revision
Mingkun Huang
Meng Cai
Jun Zhang
Yang Zhang
Yongbin You
Yi He
Zejun Ma
BDL
24
2
0
03 Nov 2020
Streaming Attention-Based Models with Augmented Memory for End-to-End
  Speech Recognition
Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition
Ching-Feng Yeh
Yongqiang Wang
Yangyang Shi
Chunyang Wu
Frank Zhang
Julian Chan
M. Seltzer
AI4TS
RALM
39
8
0
03 Nov 2020
Focus on the present: a regularization method for the ASR source-target
  attention layer
Focus on the present: a regularization method for the ASR source-target attention layer
Nanxin Chen
Piotr Żelasko
Jesús Villalba
Najim Dehak
23
3
0
02 Nov 2020
Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition
Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition
Wei Zhou
Simon Berger
Ralf Schluter
Hermann Ney
16
33
0
30 Oct 2020
Decoupling Pronunciation and Language for End-to-end Code-switching
  Automatic Speech Recognition
Decoupling Pronunciation and Language for End-to-end Code-switching Automatic Speech Recognition
Shuai Zhang
Jiangyan Yi
Zhengkun Tian
Ye Bai
J. Tao
Zhengqi Wen
6
14
0
28 Oct 2020
CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer
  for Speech Recognition
CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer for Speech Recognition
Ruchao Fan
Wei Chu
Peng Chang
Jing Xiao
14
36
0
28 Oct 2020
Cascaded encoders for unifying streaming and non-streaming ASR
Cascaded encoders for unifying streaming and non-streaming ASR
A. Narayanan
Tara N. Sainath
Ruoming Pang
Jiahui Yu
Chung-Cheng Chiu
Rohit Prabhavalkar
Ehsan Variani
Trevor Strohman
AuLLM
8
85
0
27 Oct 2020
Multitask Training with Text Data for End-to-End Speech Recognition
Multitask Training with Text Data for End-to-End Speech Recognition
Peidong Wang
Tara N. Sainath
Ron J. Weiss
21
27
0
27 Oct 2020
Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single
  Encoder-Decoder Model
Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model
Zhifu Gao
Shiliang Zhang
Ming Lei
Ian Mcloughlin
CVBM
32
15
0
27 Oct 2020
HarperValleyBank: A Domain-Specific Spoken Dialog Corpus
HarperValleyBank: A Domain-Specific Spoken Dialog Corpus
Mike Wu
J. Nafziger
A. Scodary
Andrew L. Maas
31
17
0
26 Oct 2020
Improved Neural Language Model Fusion for Streaming Recurrent Neural
  Network Transducer
Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer
Suyoun Kim
Shangguan Yuan
Jay Mahadeokar
A. Bruguier
Christian Fuegen
M. Seltzer
Duc Le
23
28
0
26 Oct 2020
Semi-Supervised Spoken Language Understanding via Self-Supervised Speech
  and Language Model Pretraining
Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining
Cheng-I Jeff Lai
Yung-Sung Chuang
Hung-yi Lee
Shang-Wen Li
James R. Glass
VLM
SSL
27
58
0
26 Oct 2020
AutoSpeech 2020: The Second Automated Machine Learning Challenge for
  Speech Classification
AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification
Jingsong Wang
Tom Ko
Zhen Xu
Xiawei Guo
Souxiang Liu
Wei-Wei Tu
Lei Xie
6
2
0
25 Oct 2020
Align-Refine: Non-Autoregressive Speech Recognition via Iterative
  Realignment
Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment
Ethan A. Chi
Julian Salazar
Katrin Kirchhoff
AI4TS
25
51
0
24 Oct 2020
On Minimum Word Error Rate Training of the Hybrid Autoregressive
  Transducer
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer
Liang Lu
Zhong Meng
Naoyuki Kanda
Jinyu Li
Jiawei Liu
24
12
0
23 Oct 2020
Transformer-based End-to-End Speech Recognition with Local Dense
  Synthesizer Attention
Transformer-based End-to-End Speech Recognition with Local Dense Synthesizer Attention
Menglong Xu
Shengqiang Li
Xiao-Lei Zhang
27
31
0
23 Oct 2020
Improving Streaming Automatic Speech Recognition With Non-Streaming
  Model Distillation On Unsupervised Data
Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data
Thibault Doutre
Wei Han
Min Ma
Zhiyun Lu
Chung-Cheng Chiu
Ruoming Pang
A. Narayanan
Ananya Misra
Yu Zhang
Liangliang Cao
75
22
0
22 Oct 2020
SlimIPL: Language-Model-Free Iterative Pseudo-Labeling
SlimIPL: Language-Model-Free Iterative Pseudo-Labeling
Tatiana Likhomanenko
Qiantong Xu
Jacob Kahn
Gabriel Synnaeve
R. Collobert
VLM
29
61
0
22 Oct 2020
Confidence Estimation for Attention-based Sequence-to-sequence Models
  for Speech Recognition
Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition
Qiujia Li
David Qiu
Yu Zhang
Bo Li
Yanzhang He
P. Woodland
Liangliang Cao
Trevor Strohman
12
46
0
22 Oct 2020
Developing Real-time Streaming Transformer Transducer for Speech
  Recognition on Large-scale Dataset
Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset
Xie Chen
Yu-Huan Wu
Zhenghao Wang
Shujie Liu
Jinyu Li
22
169
0
22 Oct 2020
A General Multi-Task Learning Framework to Leverage Text Data for Speech
  to Text Tasks
A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks
Yun Tang
J. Pino
Changhan Wang
Xutai Ma
Dmitriy Genzel
26
73
0
21 Oct 2020
Multimodal Speech Recognition with Unstructured Audio Masking
Multimodal Speech Recognition with Unstructured Audio Masking
Tejas Srinivasan
Ramon Sanabria
Florian Metze
Desmond Elliott
CVBM
17
10
0
16 Oct 2020
Why Layer-Wise Learning is Hard to Scale-up and a Possible Solution via
  Accelerated Downsampling
Why Layer-Wise Learning is Hard to Scale-up and a Possible Solution via Accelerated Downsampling
Wenchi Ma
Miao Yu
Kaidong Li
Guanghui Wang
17
5
0
15 Oct 2020
Lightweight End-to-End Speech Recognition from Raw Audio Data Using
  Sinc-Convolutions
Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions
Ludwig Kurzinger
Nicolas Lindae
Palle Klewitz
Gerhard Rigoll
32
5
0
15 Oct 2020
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context
  Modeling
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling
Jiahui Yu
Wei Han
Anmol Gulati
Chung-Cheng Chiu
Bo Li
Tara N. Sainath
Yonghui Wu
Ruoming Pang
30
18
0
12 Oct 2020
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
Changhan Wang
Yun Tang
Xutai Ma
Anne Wu
Sravya Popuri
Dmytro Okhonko
J. Pino
VLM
LRM
36
264
0
11 Oct 2020
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis
  Including Unsupervised Duration Modeling
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling
Jonathan Shen
Ye Jia
Mike Chrzanowski
Yu Zhang
Isaac Elias
Heiga Zen
Yonghui Wu
27
112
0
08 Oct 2020
Super-Human Performance in Online Low-latency Recognition of
  Conversational Speech
Super-Human Performance in Online Low-latency Recognition of Conversational Speech
T. Nguyen
S. Stueker
A. Waibel
BDL
9
36
0
07 Oct 2020
Representation Learning for Sequence Data with Deep Autoencoding
  Predictive Components
Representation Learning for Sequence Data with Deep Autoencoding Predictive Components
Junwen Bai
Weiran Wang
Yingbo Zhou
Caiming Xiong
SSL
AI4TS
27
12
0
07 Oct 2020
Fine-Grained Grounding for Multimodal Speech Recognition
Fine-Grained Grounding for Multimodal Speech Recognition
Tejas Srinivasan
Ramon Sanabria
Florian Metze
Desmond Elliott
25
11
0
05 Oct 2020
Explaining Deep Neural Networks
Explaining Deep Neural Networks
Oana-Maria Camburu
XAI
FAtt
38
26
0
04 Oct 2020
A Unifying Review of Deep and Shallow Anomaly Detection
A Unifying Review of Deep and Shallow Anomaly Detection
Lukas Ruff
Jacob R. Kauffmann
Robert A. Vandermeulen
G. Montavon
Wojciech Samek
Marius Kloft
Thomas G. Dietterich
Klaus-Robert Muller
UQCV
27
782
0
24 Sep 2020
End-to-End Bengali Speech Recognition
End-to-End Bengali Speech Recognition
S. Mandal
Sarthak Yadav
A. Rai
13
5
0
21 Sep 2020
On Target Segmentation for Direct Speech Translation
On Target Segmentation for Direct Speech Translation
Mattia Antonino Di Gangi
Marco Gaido
Matteo Negri
Marco Turchi
37
14
0
10 Sep 2020
Previous
123...111213...192021
Next