ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1508.01211
  4. Cited By
Listen, Attend and Spell
v1v2 (latest)

Listen, Attend and Spell

5 August 2015
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
    RALM
ArXiv (abs)PDFHTML

Papers citing "Listen, Attend and Spell"

50 / 1,041 papers shown
Title
DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector
  Quantization
DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization
Shaoshi Ling
Yuzong Liu
83
107
0
11 Dec 2020
Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid
  ASR Systems
Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems
Xinwei Li
Yuanyuan Zhang
Xiaodan Zhuang
Daben Liu
35
6
0
07 Dec 2020
A Study of Few-Shot Audio Classification
A Study of Few-Shot Audio Classification
Piper Wolters
Chris Careaga
Brian Hutchinson
Lauren A. Phillips
107
10
0
02 Dec 2020
Transformer-Transducers for Code-Switched Speech Recognition
Transformer-Transducers for Code-Switched Speech Recognition
Siddharth Dalmia
Yuzong Liu
S. Ronanki
Katrin Kirchhoff
88
47
0
30 Nov 2020
Streaming end-to-end multi-talker speech recognition
Streaming end-to-end multi-talker speech recognition
Liang Lu
Naoyuki Kanda
Jinyu Li
Jiawei Liu
75
44
0
26 Nov 2020
Bootstrap an end-to-end ASR system by multilingual training, transfer
  learning, text-to-text mapping and synthetic audio
Bootstrap an end-to-end ASR system by multilingual training, transfer learning, text-to-text mapping and synthetic audio
Manuel Giollo
Deniz Gunceler
Yulan Liu
D. Willett
56
12
0
25 Nov 2020
Multi-task Language Modeling for Improving Speech Recognition of Rare
  Words
Multi-task Language Modeling for Improving Speech Recognition of Rare Words
Chao-Han Huck Yang
Linda Liu
Ankur Gandhe
Yile Gu
A. Raju
Denis Filimonov
I. Bulyko
83
30
0
23 Nov 2020
Using Synthetic Audio to Improve The Recognition of Out-Of-Vocabulary
  Words in End-To-End ASR Systems
Using Synthetic Audio to Improve The Recognition of Out-Of-Vocabulary Words in End-To-End ASR Systems
Xianrui Zheng
Yulan Liu
Deniz Gunceler
D. Willett
137
79
0
23 Nov 2020
Multi-Channel Automatic Speech Recognition Using Deep Complex Unet
Multi-Channel Automatic Speech Recognition Using Deep Complex Unet
Yuxiang Kong
Jian Wu
Quandong Wang
Peng Gao
Weiji Zhuang
Yujun Wang
Lei Xie
75
8
0
18 Nov 2020
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
Xi Wang
Huaiping Ming
Lei He
Frank Soong
43
5
0
17 Nov 2020
Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin
  Speech Recognition with a Syllable-to-Character Converter
Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin Speech Recognition with a Syllable-to-Character Converter
Xiong Wang
Zhuoyuan Yao
Xian Shi
Lei Xie
68
30
0
17 Nov 2020
Deep Shallow Fusion for RNN-T Personalization
Deep Shallow Fusion for RNN-T Personalization
Duc Le
Gil Keren
Julian Chan
Jay Mahadeokar
Christian Fuegen
M. Seltzer
79
80
0
16 Nov 2020
Efficient Knowledge Distillation for RNN-Transducer Models
Efficient Knowledge Distillation for RNN-Transducer Models
S. Panchapagesan
Daniel S. Park
Chung-Cheng Chiu
Yuan Shangguan
Qiao Liang
A. Gruenstein
75
54
0
11 Nov 2020
Towards Semi-Supervised Semantics Understanding from Speech
Towards Semi-Supervised Semantics Understanding from Speech
Cheng-I Jeff Lai
Jin Cao
S. Bodapati
Shang-Wen Li
SSL
93
7
0
11 Nov 2020
A low latency ASR-free end to end spoken language understanding system
A low latency ASR-free end to end spoken language understanding system
Mohamed Mhiri
Samuel Myer
Vikrant Singh Tomar
69
8
0
10 Nov 2020
Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR
Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR
Xiaohui Zhang
Frank Zhang
Chunxi Liu
Kjell Schubert
Julian Chan
...
Jun Liu
Ching-Feng Yeh
Fuchun Peng
Yatharth Saraf
Geoffrey Zweig
78
20
0
09 Nov 2020
Listen, Look and Deliberate: Visual context-aware speech recognition
  using pre-trained text-video representations
Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations
Shahram Ghorbani
Yashesh Gaur
Yu Shi
Jinyu Li
75
14
0
08 Nov 2020
Dual Application of Speech Enhancement for Automatic Speech Recognition
Dual Application of Speech Enhancement for Automatic Speech Recognition
Ashutosh Pandey
Chunxi Liu
Yun Wang
Yatharth Saraf
91
37
0
07 Nov 2020
Improving RNN Transducer Based ASR with Auxiliary Tasks
Improving RNN Transducer Based ASR with Auxiliary Tasks
Chunxi Liu
Frank Zhang
Duc Le
Suyoun Kim
Yatharth Saraf
Geoffrey Zweig
91
49
0
05 Nov 2020
Paralinguistic Privacy Protection at the Edge
Paralinguistic Privacy Protection at the Edge
Ranya Aloufi
Hamed Haddadi
David E. Boyle
68
14
0
04 Nov 2020
Sequence-to-Sequence Learning via Attention Transfer for Incremental
  Speech Recognition
Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition
Sashi Novitasari
Andros Tjandra
S. Sakti
Satoshi Nakamura
CLL
34
12
0
04 Nov 2020
Augmenting Images for ASR and TTS through Single-loop and Dual-loop
  Multimodal Chain Framework
Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework
Johanes Effendi
Andros Tjandra
S. Sakti
Satoshi Nakamura
30
3
0
04 Nov 2020
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech
  Recognition
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition
Zhong Meng
S. Parthasarathy
Eric Sun
Yashesh Gaur
Naoyuki Kanda
Liang Lu
Xie Chen
Rui Zhao
Jinyu Li
Jiawei Liu
AuLLM
83
110
0
03 Nov 2020
Improving RNN transducer with normalized jointer network
Improving RNN transducer with normalized jointer network
Mingkun Huang
Jun Zhang
Meng Cai
Yang Zhang
Jiali Yao
Yongbin You
Yi He
Zejun Ma
151
7
0
03 Nov 2020
Dynamic latency speech recognition with asynchronous revision
Dynamic latency speech recognition with asynchronous revision
Mingkun Huang
Meng Cai
Jun Zhang
Yang Zhang
Yongbin You
Yi He
Zejun Ma
BDL
50
2
0
03 Nov 2020
Streaming Attention-Based Models with Augmented Memory for End-to-End
  Speech Recognition
Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition
Ching-Feng Yeh
Yongqiang Wang
Yangyang Shi
Chunyang Wu
Frank Zhang
Julian Chan
M. Seltzer
AI4TSRALM
76
8
0
03 Nov 2020
Focus on the present: a regularization method for the ASR source-target
  attention layer
Focus on the present: a regularization method for the ASR source-target attention layer
Nanxin Chen
Piotr Żelasko
Jesús Villalba
Najim Dehak
49
3
0
02 Nov 2020
Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition
Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition
Wei Zhou
Simon Berger
Ralf Schluter
Hermann Ney
120
33
0
30 Oct 2020
Decoupling Pronunciation and Language for End-to-end Code-switching
  Automatic Speech Recognition
Decoupling Pronunciation and Language for End-to-end Code-switching Automatic Speech Recognition
Shuai Zhang
Jiangyan Yi
Zhengkun Tian
Ye Bai
J. Tao
Zhengqi Wen
43
14
0
28 Oct 2020
CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer
  for Speech Recognition
CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer for Speech Recognition
Ruchao Fan
Wei Chu
Peng Chang
Jing Xiao
67
37
0
28 Oct 2020
Cascaded encoders for unifying streaming and non-streaming ASR
Cascaded encoders for unifying streaming and non-streaming ASR
A. Narayanan
Tara N. Sainath
Ruoming Pang
Jiahui Yu
Chung-Cheng Chiu
Rohit Prabhavalkar
Ehsan Variani
Trevor Strohman
AuLLM
128
86
0
27 Oct 2020
Multitask Training with Text Data for End-to-End Speech Recognition
Multitask Training with Text Data for End-to-End Speech Recognition
Peidong Wang
Tara N. Sainath
Ron J. Weiss
92
27
0
27 Oct 2020
Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single
  Encoder-Decoder Model
Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model
Zhifu Gao
Shiliang Zhang
Ming Lei
Ian Mcloughlin
CVBM
54
15
0
27 Oct 2020
HarperValleyBank: A Domain-Specific Spoken Dialog Corpus
HarperValleyBank: A Domain-Specific Spoken Dialog Corpus
Mike Wu
J. Nafziger
A. Scodary
Andrew L. Maas
95
17
0
26 Oct 2020
Improved Neural Language Model Fusion for Streaming Recurrent Neural
  Network Transducer
Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer
Suyoun Kim
Shangguan Yuan
Jay Mahadeokar
A. Bruguier
Christian Fuegen
M. Seltzer
Duc Le
71
29
0
26 Oct 2020
Semi-Supervised Spoken Language Understanding via Self-Supervised Speech
  and Language Model Pretraining
Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining
Cheng-I Jeff Lai
Yung-Sung Chuang
Hung-yi Lee
Shang-Wen Li
James R. Glass
VLMSSL
103
60
0
26 Oct 2020
AutoSpeech 2020: The Second Automated Machine Learning Challenge for
  Speech Classification
AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification
Jingsong Wang
Tom Ko
Zhen Xu
Xiawei Guo
Souxiang Liu
Wei-Wei Tu
Lei Xie
27
2
0
25 Oct 2020
Align-Refine: Non-Autoregressive Speech Recognition via Iterative
  Realignment
Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment
Ethan A. Chi
Julian Salazar
Katrin Kirchhoff
AI4TS
90
52
0
24 Oct 2020
On Minimum Word Error Rate Training of the Hybrid Autoregressive
  Transducer
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer
Liang Lu
Zhong Meng
Naoyuki Kanda
Jinyu Li
Jiawei Liu
79
12
0
23 Oct 2020
Transformer-based End-to-End Speech Recognition with Local Dense
  Synthesizer Attention
Transformer-based End-to-End Speech Recognition with Local Dense Synthesizer Attention
Menglong Xu
Shengqiang Li
Xiao-Lei Zhang
86
32
0
23 Oct 2020
Improving Streaming Automatic Speech Recognition With Non-Streaming
  Model Distillation On Unsupervised Data
Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data
Thibault Doutre
Wei Han
Min Ma
Zhiyun Lu
Chung-Cheng Chiu
Ruoming Pang
A. Narayanan
Ananya Misra
Yu Zhang
Liangliang Cao
127
23
0
22 Oct 2020
SlimIPL: Language-Model-Free Iterative Pseudo-Labeling
SlimIPL: Language-Model-Free Iterative Pseudo-Labeling
Tatiana Likhomanenko
Qiantong Xu
Jacob Kahn
Gabriel Synnaeve
R. Collobert
VLM
136
65
0
22 Oct 2020
Confidence Estimation for Attention-based Sequence-to-sequence Models
  for Speech Recognition
Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition
Qiujia Li
David Qiu
Yu Zhang
Yue Liu
Yanzhang He
P. Woodland
Liangliang Cao
Trevor Strohman
50
49
0
22 Oct 2020
Developing Real-time Streaming Transformer Transducer for Speech
  Recognition on Large-scale Dataset
Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset
Xie Chen
Yu-Huan Wu
Zhenghao Wang
Shujie Liu
Jinyu Li
149
177
0
22 Oct 2020
A General Multi-Task Learning Framework to Leverage Text Data for Speech
  to Text Tasks
A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks
Yun Tang
J. Pino
Changhan Wang
Xutai Ma
Dmitriy Genzel
81
75
0
21 Oct 2020
Multimodal Speech Recognition with Unstructured Audio Masking
Multimodal Speech Recognition with Unstructured Audio Masking
Tejas Srinivasan
Ramon Sanabria
Florian Metze
Desmond Elliott
CVBM
50
10
0
16 Oct 2020
Why Layer-Wise Learning is Hard to Scale-up and a Possible Solution via
  Accelerated Downsampling
Why Layer-Wise Learning is Hard to Scale-up and a Possible Solution via Accelerated Downsampling
Wenchi Ma
Miao Yu
Kaidong Li
Guanghui Wang
77
6
0
15 Oct 2020
Lightweight End-to-End Speech Recognition from Raw Audio Data Using
  Sinc-Convolutions
Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions
Ludwig Kurzinger
Nicolas Lindae
Palle Klewitz
Gerhard Rigoll
67
5
0
15 Oct 2020
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context
  Modeling
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling
Jiahui Yu
Wei Han
Anmol Gulati
Chung-Cheng Chiu
Yue Liu
Tara N. Sainath
Yonghui Wu
Ruoming Pang
125
19
0
12 Oct 2020
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
Changhan Wang
Yun Tang
Xutai Ma
Anne Wu
Sravya Popuri
Dmytro Okhonko
J. Pino
VLMLRM
119
276
0
11 Oct 2020
Previous
123...111213...192021
Next