Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1508.01211
Cited By
v1
v2 (latest)
Listen, Attend and Spell
5 August 2015
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Listen, Attend and Spell"
50 / 1,041 papers shown
Title
Vision-Aided Dynamic Blockage Prediction for 6G Wireless Communication Networks
Gouranga Charan
Muhammad Alrabeiah
Ahmed Alkhateeb
82
34
0
17 Jun 2020
Exploration of End-to-End ASR for OpenSTT -- Russian Open Speech-to-Text Dataset
A. Andrusenko
A. Laptev
Ivan Medennikov
VLM
122
12
0
15 Jun 2020
Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation
Changhan Wang
J. Pino
Jiatao Gu
79
30
0
09 Jun 2020
Learning to Count Words in Fluent Speech enables Online Speech Recognition
George Sterpu
Christian Saam
N. Harte
63
4
0
08 Jun 2020
Contextual RNN-T For Open Domain ASR
Mahaveer Jain
Gil Keren
Jay Mahadeokar
Geoffrey Zweig
Florian Metze
Yatharth Saraf
63
104
0
04 Jun 2020
Detecting Audio Attacks on ASR Systems with Dropout Uncertainty
T. Jayashankar
Jonathan Le Roux
P. Moulin
AAML
34
17
0
02 Jun 2020
On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition
Jinyu Li
Yu-Huan Wu
Yashesh Gaur
Chengyi Wang
Rui Zhao
Shujie Liu
73
137
0
28 May 2020
Insertion-Based Modeling for End-to-End Automatic Speech Recognition
Yuya Fujita
Shinji Watanabe
Motoi Omachi
Xuankai Chan
80
31
0
27 May 2020
A Structural Model for Contextual Code Changes
Shaked Brody
Uri Alon
Eran Yahav
KELM
99
7
0
27 May 2020
Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection
Danni Liu
Gerasimos Spanakis
Jan Niehues
79
50
0
22 May 2020
Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism
Wang Dai
Jinsong Zhang
Yingming Gao
Wei Wei
Dengfeng Ke
Binghuai Lin
Yanlu Xie
61
4
0
21 May 2020
ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition
Jing Pan
Joshua Shapiro
Jeremy Wohlwend
Kyu Jeong Han
Tao Lei
T. Ma
72
22
0
21 May 2020
Simplified Self-Attention for Transformer-based End-to-End Speech Recognition
Haoneng Luo
Shiliang Zhang
Ming Lei
Lei Xie
128
34
0
21 May 2020
Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition
Shiliang Zhang
Zhifu Gao
Haoneng Luo
Ming Lei
Jie Ying Gao
Zhijie Yan
Lei Xie
64
29
0
21 May 2020
SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition
Zhifu Gao
Shiliang Zhang
Ming Lei
Ian Mcloughlin
81
35
0
21 May 2020
Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning
Zhiping Zeng
Van Tung Pham
Haihua Xu
Yerbolat Khassanov
Chng Eng Siong
Chongjia Ni
B. Ma
17
13
0
21 May 2020
A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Linhao Dong
Cheng Yi
Jianzong Wang
Shiyu Zhou
Shuang Xu
X. Jia
Bo Xu
68
17
0
20 May 2020
A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition
Dongwei Jiang
Wubo Li
Ruixiong Zhang
Miao Cao
Ne Luo
Yang Han
Wei Zou
Xiangang Li
SSL
70
29
0
20 May 2020
Improved Noisy Student Training for Automatic Speech Recognition
Daniel S. Park
Yu Zhang
Ye Jia
Wei Han
Chung-Cheng Chiu
Yue Liu
Yonghui Wu
Quoc V. Le
124
243
0
19 May 2020
Enhancing Monotonic Multihead Attention for Streaming ASR
Hirofumi Inaguma
Masato Mimura
Tatsuya Kawahara
101
34
0
19 May 2020
A systematic comparison of grapheme-based vs. phoneme-based label units for encoder-decoder-attention models
Mohammad Zeineldeen
Albert Zeyer
Wei Zhou
T. Ng
Ralf Schluter
Hermann Ney
71
2
0
19 May 2020
Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition
Kohei Matsuura
Masato Mimura
S. Sakai
Tatsuya Kawahara
29
8
0
19 May 2020
Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces
Frank Zhang
Yongqiang Wang
Xiaohui Zhang
Chunxi Liu
Yatharth Saraf
Geoffrey Zweig
75
20
0
19 May 2020
Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict
Yosuke Higuchi
Shinji Watanabe
Nanxin Chen
Tetsuji Ogawa
Tetsunori Kobayashi
73
139
0
18 May 2020
Reducing Spelling Inconsistencies in Code-Switching ASR using Contextualized CTC Loss
Burin Naowarat
Thananchai Kongthaworn
Korrawe Karunratanakul
Sheng Hui Wu
Ekapol Chuangsuwanich
67
9
0
16 May 2020
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition
Zhengkun Tian
Jiangyan Yi
J. Tao
Ye Bai
Shuai Zhang
Zhengqi Wen
99
54
0
16 May 2020
Large scale weakly and semi-supervised learning for low-resource video ASR
Kritika Singh
Vimal Manohar
Alex Xiao
Sergey Edunov
Ross B. Girshick
Vitaliy Liptchinsky
Christian Fuegen
Yatharth Saraf
Geoffrey Zweig
Abdel-rahman Mohamed
77
9
0
16 May 2020
FaceFilter: Audio-visual speech separation using still images
Soo-Whan Chung
Soyeon Choe
Joon Son Chung
Hong-Goo Kang
CVBM
122
68
0
14 May 2020
Discriminative Multi-modality Speech Recognition
Bo Xu
Cheng Lu
Yandong Guo
Jacob Wang
91
99
0
12 May 2020
Incremental Learning for End-to-End Automatic Speech Recognition
Li Fu
Xiaoxiao Li
Libo Zi
Zhengchen Zhang
Youzheng Wu
Xiaodong He
Bowen Zhou
CLL
94
23
0
11 May 2020
Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
Ye Bai
Jiangyan Yi
J. Tao
Zhengkun Tian
Zhengqi Wen
Shuai Zhang
RALM
80
41
0
11 May 2020
RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions
Chung-Cheng Chiu
A. Narayanan
Wei Han
Rohit Prabhavalkar
Yu Zhang
...
Ruoming Pang
Tara N. Sainath
Patrick Nguyen
Liangliang Cao
Yonghui Wu
101
42
0
07 May 2020
AutoSpeech: Neural Architecture Search for Speaker Recognition
Shaojin Ding
Tianlong Chen
Xinyu Gong
Weiwei Zha
Zhangyang Wang
74
57
0
07 May 2020
End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Pseudo Whisper Pre-training
Heng-Jui Chang
Alexander H. Liu
Hung-yi Lee
Lin-Shan Lee
30
2
0
05 May 2020
Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition
Hu Hu
Rui Zhao
Jinyu Li
Liang Lu
Jiawei Liu
65
27
0
01 May 2020
Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision
Soo-Whan Chung
Hong-Goo Kang
Joon Son Chung
SSL
55
39
0
29 Apr 2020
Multiresolution and Multimodal Speech Recognition with Transformers
Georgios Paraskevopoulos
Srinivas Parthasarathy
Aparna Khare
Shiva Sundaram
111
29
0
29 Apr 2020
Transliteration of Judeo-Arabic Texts into Arabic Script Using Recurrent Neural Networks
Ori Terner
Kfir Bar
Nachum Dershowitz
23
3
0
23 Apr 2020
Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription
A. Andrusenko
A. Laptev
Ivan Medennikov
69
16
0
22 Apr 2020
ESPnet-ST: All-in-One Speech Translation Toolkit
Hirofumi Inaguma
Shun Kiyono
Kevin Duh
Shigeki Karita
Nelson Yalta
Tomoki Hayashi
Shinji Watanabe
120
166
0
21 Apr 2020
Curriculum Pre-training for End-to-End Speech Translation
Chengyi Wang
Yu Wu
Shujie Liu
Ming Zhou
Zhenglu Yang
88
109
0
21 Apr 2020
ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers
Jung-Woo Ha
KiHyun Nam
Jin Gu Kang
Sang-Woo Lee
Sohee Yang
...
Hyun Ah Kim
Kyoungtae Doh
C. Lee
Nako Sung
Sunghun Kim
45
29
0
20 Apr 2020
How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition
George Sterpu
Christian Saam
N. Harte
74
29
0
17 Apr 2020
Fast and Accurate Deep Bidirectional Language Representations for Unsupervised Learning
Joongbo Shin
Yoonhyung Lee
Seunghyun Yoon
Kyomin Jung
OOD
76
12
0
17 Apr 2020
Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR
Hirofumi Inaguma
Yashesh Gaur
Liang Lu
Jinyu Li
Jiawei Liu
AI4TS
90
46
0
10 Apr 2020
Neuronal Sequence Models for Bayesian Online Inference
Sascha Frölich
D. Marković
S. Kiebel
48
9
0
02 Apr 2020
Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms
Jee-weon Jung
Seung-bin Kim
Hye-jin Shim
Ju-ho Kim
Ha-Jin Yu
77
60
0
01 Apr 2020
Serialized Output Training for End-to-End Overlapped Speech Recognition
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
87
122
0
28 Mar 2020
Can you hear me
now
\textit{now}
now
? Sensitive comparisons of human and machine perception
Michael A. Lepori
C. Firestone
AAML
79
9
0
27 Mar 2020
High Performance Sequence-to-Sequence Model for Streaming Speech Recognition
T. Nguyen
Ngoc-Quan Pham
S. Stueker
A. Waibel
42
7
0
22 Mar 2020
Previous
1
2
3
...
13
14
15
...
19
20
21
Next