ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1508.01211
  4. Cited By
Listen, Attend and Spell
v1v2 (latest)

Listen, Attend and Spell

5 August 2015
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
    RALM
ArXiv (abs)PDFHTML

Papers citing "Listen, Attend and Spell"

50 / 1,041 papers shown
Title
Vision-Aided Dynamic Blockage Prediction for 6G Wireless Communication
  Networks
Vision-Aided Dynamic Blockage Prediction for 6G Wireless Communication Networks
Gouranga Charan
Muhammad Alrabeiah
Ahmed Alkhateeb
82
34
0
17 Jun 2020
Exploration of End-to-End ASR for OpenSTT -- Russian Open Speech-to-Text
  Dataset
Exploration of End-to-End ASR for OpenSTT -- Russian Open Speech-to-Text Dataset
A. Andrusenko
A. Laptev
Ivan Medennikov
VLM
122
12
0
15 Jun 2020
Improving Cross-Lingual Transfer Learning for End-to-End Speech
  Recognition with Speech Translation
Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation
Changhan Wang
J. Pino
Jiatao Gu
79
30
0
09 Jun 2020
Learning to Count Words in Fluent Speech enables Online Speech
  Recognition
Learning to Count Words in Fluent Speech enables Online Speech Recognition
George Sterpu
Christian Saam
N. Harte
63
4
0
08 Jun 2020
Contextual RNN-T For Open Domain ASR
Contextual RNN-T For Open Domain ASR
Mahaveer Jain
Gil Keren
Jay Mahadeokar
Geoffrey Zweig
Florian Metze
Yatharth Saraf
63
104
0
04 Jun 2020
Detecting Audio Attacks on ASR Systems with Dropout Uncertainty
Detecting Audio Attacks on ASR Systems with Dropout Uncertainty
T. Jayashankar
Jonathan Le Roux
P. Moulin
AAML
34
17
0
02 Jun 2020
On the Comparison of Popular End-to-End Models for Large Scale Speech
  Recognition
On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition
Jinyu Li
Yu-Huan Wu
Yashesh Gaur
Chengyi Wang
Rui Zhao
Shujie Liu
73
137
0
28 May 2020
Insertion-Based Modeling for End-to-End Automatic Speech Recognition
Insertion-Based Modeling for End-to-End Automatic Speech Recognition
Yuya Fujita
Shinji Watanabe
Motoi Omachi
Xuankai Chan
80
31
0
27 May 2020
A Structural Model for Contextual Code Changes
A Structural Model for Contextual Code Changes
Shaked Brody
Uri Alon
Eran Yahav
KELM
99
7
0
27 May 2020
Low-Latency Sequence-to-Sequence Speech Recognition and Translation by
  Partial Hypothesis Selection
Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection
Danni Liu
Gerasimos Spanakis
Jan Niehues
79
50
0
22 May 2020
Formant Tracking Using Dilated Convolutional Networks Through Dense
  Connection with Gating Mechanism
Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism
Wang Dai
Jinsong Zhang
Yingming Gao
Wei Wei
Dengfeng Ke
Binghuai Lin
Yanlu Xie
61
4
0
21 May 2020
ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech
  Recognition
ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition
Jing Pan
Joshua Shapiro
Jeremy Wohlwend
Kyu Jeong Han
Tao Lei
T. Ma
72
22
0
21 May 2020
Simplified Self-Attention for Transformer-based End-to-End Speech
  Recognition
Simplified Self-Attention for Transformer-based End-to-End Speech Recognition
Haoneng Luo
Shiliang Zhang
Ming Lei
Lei Xie
128
34
0
21 May 2020
Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech
  Recognition
Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition
Shiliang Zhang
Zhifu Gao
Haoneng Luo
Ming Lei
Jie Ying Gao
Zhijie Yan
Lei Xie
64
29
0
21 May 2020
SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition
SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition
Zhifu Gao
Shiliang Zhang
Ming Lei
Ian Mcloughlin
81
35
0
21 May 2020
Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR
  in Transfer Learning
Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning
Zhiping Zeng
Van Tung Pham
Haihua Xu
Yerbolat Khassanov
Chng Eng Siong
Chongjia Ni
B. Ma
17
13
0
21 May 2020
A Comparison of Label-Synchronous and Frame-Synchronous End-to-End
  Models for Speech Recognition
A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Linhao Dong
Cheng Yi
Jianzong Wang
Shiyu Zhou
Shuang Xu
X. Jia
Bo Xu
68
17
0
20 May 2020
A Further Study of Unsupervised Pre-training for Transformer Based
  Speech Recognition
A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition
Dongwei Jiang
Wubo Li
Ruixiong Zhang
Miao Cao
Ne Luo
Yang Han
Wei Zou
Xiangang Li
SSL
70
29
0
20 May 2020
Improved Noisy Student Training for Automatic Speech Recognition
Improved Noisy Student Training for Automatic Speech Recognition
Daniel S. Park
Yu Zhang
Ye Jia
Wei Han
Chung-Cheng Chiu
Yue Liu
Yonghui Wu
Quoc V. Le
124
243
0
19 May 2020
Enhancing Monotonic Multihead Attention for Streaming ASR
Enhancing Monotonic Multihead Attention for Streaming ASR
Hirofumi Inaguma
Masato Mimura
Tatsuya Kawahara
101
34
0
19 May 2020
A systematic comparison of grapheme-based vs. phoneme-based label units
  for encoder-decoder-attention models
A systematic comparison of grapheme-based vs. phoneme-based label units for encoder-decoder-attention models
Mohammad Zeineldeen
Albert Zeyer
Wei Zhou
T. Ng
Ralf Schluter
Hermann Ney
71
2
0
19 May 2020
Generative Adversarial Training Data Adaptation for Very Low-resource
  Automatic Speech Recognition
Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition
Kohei Matsuura
Masato Mimura
S. Sakai
Tatsuya Kawahara
29
8
0
19 May 2020
Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces
Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces
Frank Zhang
Yongqiang Wang
Xiaohui Zhang
Chunxi Liu
Yatharth Saraf
Geoffrey Zweig
75
20
0
19 May 2020
Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict
Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict
Yosuke Higuchi
Shinji Watanabe
Nanxin Chen
Tetsuji Ogawa
Tetsunori Kobayashi
73
139
0
18 May 2020
Reducing Spelling Inconsistencies in Code-Switching ASR using
  Contextualized CTC Loss
Reducing Spelling Inconsistencies in Code-Switching ASR using Contextualized CTC Loss
Burin Naowarat
Thananchai Kongthaworn
Korrawe Karunratanakul
Sheng Hui Wu
Ekapol Chuangsuwanich
67
9
0
16 May 2020
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech
  Recognition
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition
Zhengkun Tian
Jiangyan Yi
J. Tao
Ye Bai
Shuai Zhang
Zhengqi Wen
99
54
0
16 May 2020
Large scale weakly and semi-supervised learning for low-resource video
  ASR
Large scale weakly and semi-supervised learning for low-resource video ASR
Kritika Singh
Vimal Manohar
Alex Xiao
Sergey Edunov
Ross B. Girshick
Vitaliy Liptchinsky
Christian Fuegen
Yatharth Saraf
Geoffrey Zweig
Abdel-rahman Mohamed
77
9
0
16 May 2020
FaceFilter: Audio-visual speech separation using still images
FaceFilter: Audio-visual speech separation using still images
Soo-Whan Chung
Soyeon Choe
Joon Son Chung
Hong-Goo Kang
CVBM
122
68
0
14 May 2020
Discriminative Multi-modality Speech Recognition
Discriminative Multi-modality Speech Recognition
Bo Xu
Cheng Lu
Yandong Guo
Jacob Wang
91
99
0
12 May 2020
Incremental Learning for End-to-End Automatic Speech Recognition
Incremental Learning for End-to-End Automatic Speech Recognition
Li Fu
Xiaoxiao Li
Libo Zi
Zhengchen Zhang
Youzheng Wu
Xiaodong He
Bowen Zhou
CLL
94
23
0
11 May 2020
Listen Attentively, and Spell Once: Whole Sentence Generation via a
  Non-Autoregressive Architecture for Low-Latency Speech Recognition
Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
Ye Bai
Jiangyan Yi
J. Tao
Zhengkun Tian
Zhengqi Wen
Shuai Zhang
RALM
80
41
0
11 May 2020
RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and
  Solutions
RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions
Chung-Cheng Chiu
A. Narayanan
Wei Han
Rohit Prabhavalkar
Yu Zhang
...
Ruoming Pang
Tara N. Sainath
Patrick Nguyen
Liangliang Cao
Yonghui Wu
101
42
0
07 May 2020
AutoSpeech: Neural Architecture Search for Speaker Recognition
AutoSpeech: Neural Architecture Search for Speaker Recognition
Shaojin Ding
Tianlong Chen
Xinyu Gong
Weiwei Zha
Zhangyang Wang
74
57
0
07 May 2020
End-to-end Whispered Speech Recognition with Frequency-weighted
  Approaches and Pseudo Whisper Pre-training
End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Pseudo Whisper Pre-training
Heng-Jui Chang
Alexander H. Liu
Hung-yi Lee
Lin-Shan Lee
30
2
0
05 May 2020
Exploring Pre-training with Alignments for RNN Transducer based
  End-to-End Speech Recognition
Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition
Hu Hu
Rui Zhao
Jinyu Li
Liang Lu
Jiawei Liu
65
27
0
01 May 2020
Seeing voices and hearing voices: learning discriminative embeddings
  using cross-modal self-supervision
Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision
Soo-Whan Chung
Hong-Goo Kang
Joon Son Chung
SSL
55
39
0
29 Apr 2020
Multiresolution and Multimodal Speech Recognition with Transformers
Multiresolution and Multimodal Speech Recognition with Transformers
Georgios Paraskevopoulos
Srinivas Parthasarathy
Aparna Khare
Shiva Sundaram
111
29
0
29 Apr 2020
Transliteration of Judeo-Arabic Texts into Arabic Script Using Recurrent
  Neural Networks
Transliteration of Judeo-Arabic Texts into Arabic Script Using Recurrent Neural Networks
Ori Terner
Kfir Bar
Nachum Dershowitz
23
3
0
23 Apr 2020
Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
  Party Transcription
Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription
A. Andrusenko
A. Laptev
Ivan Medennikov
69
16
0
22 Apr 2020
ESPnet-ST: All-in-One Speech Translation Toolkit
ESPnet-ST: All-in-One Speech Translation Toolkit
Hirofumi Inaguma
Shun Kiyono
Kevin Duh
Shigeki Karita
Nelson Yalta
Tomoki Hayashi
Shinji Watanabe
120
166
0
21 Apr 2020
Curriculum Pre-training for End-to-End Speech Translation
Curriculum Pre-training for End-to-End Speech Translation
Chengyi Wang
Yu Wu
Shujie Liu
Ming Zhou
Zhenglu Yang
88
109
0
21 Apr 2020
ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic
  Speech Recognition of Contact Centers
ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers
Jung-Woo Ha
KiHyun Nam
Jin Gu Kang
Sang-Woo Lee
Sohee Yang
...
Hyun Ah Kim
Kyoungtae Doh
C. Lee
Nako Sung
Sunghun Kim
45
29
0
20 Apr 2020
How to Teach DNNs to Pay Attention to the Visual Modality in Speech
  Recognition
How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition
George Sterpu
Christian Saam
N. Harte
74
29
0
17 Apr 2020
Fast and Accurate Deep Bidirectional Language Representations for
  Unsupervised Learning
Fast and Accurate Deep Bidirectional Language Representations for Unsupervised Learning
Joongbo Shin
Yoonhyung Lee
Seunghyun Yoon
Kyomin Jung
OOD
76
12
0
17 Apr 2020
Minimum Latency Training Strategies for Streaming Sequence-to-Sequence
  ASR
Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR
Hirofumi Inaguma
Yashesh Gaur
Liang Lu
Jinyu Li
Jiawei Liu
AI4TS
90
46
0
10 Apr 2020
Neuronal Sequence Models for Bayesian Online Inference
Neuronal Sequence Models for Bayesian Online Inference
Sascha Frölich
D. Marković
S. Kiebel
48
9
0
02 Apr 2020
Improved RawNet with Feature Map Scaling for Text-independent Speaker
  Verification using Raw Waveforms
Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms
Jee-weon Jung
Seung-bin Kim
Hye-jin Shim
Ju-ho Kim
Ha-Jin Yu
77
60
0
01 Apr 2020
Serialized Output Training for End-to-End Overlapped Speech Recognition
Serialized Output Training for End-to-End Overlapped Speech Recognition
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
87
122
0
28 Mar 2020
Can you hear me $\textit{now}$? Sensitive comparisons of human and
  machine perception
Can you hear me now\textit{now}now? Sensitive comparisons of human and machine perception
Michael A. Lepori
C. Firestone
AAML
79
9
0
27 Mar 2020
High Performance Sequence-to-Sequence Model for Streaming Speech
  Recognition
High Performance Sequence-to-Sequence Model for Streaming Speech Recognition
T. Nguyen
Ngoc-Quan Pham
S. Stueker
A. Waibel
42
7
0
22 Mar 2020
Previous
123...131415...192021
Next