ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1508.01211
  4. Cited By
Listen, Attend and Spell

Listen, Attend and Spell

5 August 2015
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
    RALM
ArXivPDFHTML

Papers citing "Listen, Attend and Spell"

50 / 1,033 papers shown
Title
Research on an improved Conformer end-to-end Speech Recognition Model
  with R-Drop Structure
Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure
Weidong Ji
Shijie Zan
Guohui Zhou
Xu Wang
SyDa
19
1
0
14 Jun 2023
DCTX-Conformer: Dynamic context carry-over for low latency unified
  streaming and non-streaming Conformer ASR
DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer ASR
Goeric Huybrechts
S. Ronanki
Xilai Li
H. Nosrati
S. Bodapati
Katrin Kirchhoff
18
1
0
13 Jun 2023
Record Deduplication for Entity Distribution Modeling in ASR Transcripts
Record Deduplication for Entity Distribution Modeling in ASR Transcripts
Tianyu Huang
Chung Hoon Hong
Carl N. Wivagg
Kanna Shimizu
18
0
0
09 Jun 2023
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in
  End-to-End Automatic Speech Recognition
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Xianzhao Chen
Yist Y. Lin
Kang Wang
Yi He
Zejun Ma
29
2
0
09 Jun 2023
Streaming Speech-to-Confusion Network Speech Recognition
Streaming Speech-to-Confusion Network Speech Recognition
Denis Filimonov
Prabhat Pandey
Ariya Rastrow
Ankur Gandhe
A. Stolcke
HAI
29
0
0
02 Jun 2023
Adapting an Unadaptable ASR System
Adapting an Unadaptable ASR System
Rao Ma
Mengjie Qian
Mark J. F. Gales
Kate Knill
30
3
0
01 Jun 2023
Adaptive Contextual Biasing for Transducer Based Streaming Speech
  Recognition
Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
Tianyi Xu
Zhanheng Yang
Kaixun Huang
Pengcheng Guo
Aoting Zhang
Biao Li
Changru Chen
Chong Li
Linfu Xie
22
10
0
01 Jun 2023
Enhancing the Unified Streaming and Non-streaming Model with Contrastive
  Learning
Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning
Yuting Yang
Yuke Li
Binbin Du
AI4TS
33
0
0
01 Jun 2023
Encoder-decoder multimodal speaker change detection
Encoder-decoder multimodal speaker change detection
Jee-weon Jung
Soonshin Seo
Hee-Soo Heo
Geon-min Kim
You Jin Kim
Youngki Kwon
Min-Ji Lee
Bong-Jin Lee
37
2
0
01 Jun 2023
Edit Distance based RL for RNNT decoding
Edit Distance based RL for RNNT decoding
DongSeon Hwang
Changwan Ryu
K. Sim
19
0
0
31 May 2023
Graph Neural Networks for Contextual ASR with the Tree-Constrained
  Pointer Generator
Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator
Guangzhi Sun
C. Zhang
P. Woodland
18
4
0
30 May 2023
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Chenda Li
Yao Qian
Zhuo Chen
Naoyuki Kanda
Dongmei Wang
Takuya Yoshioka
Y. Qian
Michael Zeng
37
11
0
30 May 2023
Building Accurate Low Latency ASR for Streaming Voice Search
Building Accurate Low Latency ASR for Streaming Voice Search
Abhinav Goyal
Nikesh Garera
11
1
0
29 May 2023
Retraining-free Customized ASR for Enharmonic Words Based on a
  Named-Entity-Aware Model and Phoneme Similarity Estimation
Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation
Yui Sudo
K. Hata
K. Nakadai
31
2
0
29 May 2023
RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech
  Recognition
RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition
Wei Zhou
Eugen Beck
Simon Berger
Ralf Schluter
Hermann Ney
VLM
30
4
0
28 May 2023
CIF-PT: Bridging Speech and Text Representations for Spoken Language
  Understanding via Continuous Integrate-and-Fire Pre-Training
CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training
Linhao Dong
Zhecheng An
Peihao Wu
Jun Zhang
Lu Lu
Zejun Ma
24
6
0
27 May 2023
DistriBlock: Identifying adversarial audio samples by leveraging
  characteristics of the output distribution
DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distribution
Matías P. Pizarro
D. Kolossa
Asja Fischer
AAML
35
1
0
26 May 2023
Rethinking Speech Recognition with A Multimodal Perspective via Acoustic
  and Semantic Cooperative Decoding
Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding
Tianren Zhang
Haibo Qin
Zhibing Lai
Songlu Chen
Qi Liu
Feng Chen
Xinyuan Qian
Xu-Cheng Yin
38
0
0
23 May 2023
CopyNE: Better Contextual ASR by Copying Named Entities
CopyNE: Better Contextual ASR by Copying Named Entities
Shilin Zhou
Zhenghua Li
Yu Hong
Hao Fei
Zhefeng Wang
Baoxing Huai
15
6
0
22 May 2023
Hystoc: Obtaining word confidences for fusion of end-to-end ASR systems
Hystoc: Obtaining word confidences for fusion of end-to-end ASR systems
Karel Beneš
M. Kocour
L. Burget
37
2
0
21 May 2023
VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select
  Indic Languages
VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages
Shivam Mhaskar
Vineet Bhat
Akshay Batheja
S. Deoghare
Paramveer Choudhary
P. Bhattacharyya
48
4
0
21 May 2023
Multi-Head State Space Model for Speech Recognition
Multi-Head State Space Model for Speech Recognition
Yassir Fathullah
Chunyang Wu
Yuan Shangguan
J. Jia
Wenhan Xiong
...
Chunxi Liu
Yangyang Shi
Ozlem Kalinli
M. Seltzer
Mark J. F. Gales
34
13
0
21 May 2023
Contextualized End-to-End Speech Recognition with Contextual Phrase
  Prediction Network
Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network
Kaixun Huang
Aoting Zhang
Zhanheng Yang
Pengcheng Guo
Bingshen Mu
Tianyi Xu
Linfu Xie
32
16
0
21 May 2023
Language-universal phonetic encoder for low-resource speech recognition
Language-universal phonetic encoder for low-resource speech recognition
Siyuan Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
39
2
0
19 May 2023
Blank-regularized CTC for Frame Skipping in Neural Transducer
Blank-regularized CTC for Frame Skipping in Neural Transducer
Yifan Yang
Xiaoyu Yang
Liyong Guo
Zengwei Yao
Wei Kang
Fangjun Kuang
Long Lin
Xie Chen
Daniel Povey
18
8
0
19 May 2023
AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide
  for Simultaneous Speech Translation
AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation
Sara Papi
Marco Turchi
Matteo Negri
32
20
0
19 May 2023
A Comparative Study on E-Branchformer vs Conformer in Speech
  Recognition, Translation, and Understanding Tasks
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Yifan Peng
Kwangyoun Kim
Felix Wu
Brian Yan
Siddhant Arora
William Chen
Jiyang Tang
Suwon Shon
Prashant Sridhar
Shinji Watanabe
29
17
0
18 May 2023
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Zhifu Gao
Zerui Li
Jiaming Wang
Haoneng Luo
Xian Shi
...
Yabin Li
Lingyun Zuo
Zhihao Du
Zhangyu Xiao
Shiliang Zhang
37
54
0
18 May 2023
A Lexical-aware Non-autoregressive Transformer-based ASR Model
A Lexical-aware Non-autoregressive Transformer-based ASR Model
Chong Lin
Kuan-Yu Chen
AI4TS
27
1
0
18 May 2023
Accurate and Reliable Confidence Estimation Based on Non-Autoregressive
  End-to-End Speech Recognition System
Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System
Xian Shi
Haoneng Luo
Zhifu Gao
Shiliang Zhang
Zhijie Yan
25
1
0
18 May 2023
Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Jason (Jinglun) Cai
Monica Sunkara
Xilai Li
Anshu Bhatia
Xiao Pan
S. Bodapati
28
3
0
11 May 2023
Robust Acoustic and Semantic Contextual Biasing in Neural Transducers
  for Speech Recognition
Robust Acoustic and Semantic Contextual Biasing in Neural Transducers for Speech Recognition
Xuandi Fu
Kanthashree Mysore Sathyendra
Ankur Gandhe
Jing Liu
Grant P. Strimel
Ross McGowan
Athanasios Mouchtaris
25
14
0
09 May 2023
End-to-end spoken language understanding using joint CTC loss and
  self-supervised, pretrained acoustic encoders
End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders
Jixuan Wang
Martin H. Radfar
Kailin Wei
Clement Chung
16
3
0
04 May 2023
Deep Transfer Learning for Automatic Speech Recognition: Towards Better
  Generalization
Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization
Hamza Kheddar
Yassine Himeur
S. Al-Maadeed
Abbes Amira
F. Bensaali
47
76
0
27 Apr 2023
Self-regularised Minimum Latency Training for Streaming
  Transformer-based Speech Recognition
Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
Mohan Li
R. Doddipatla
Catalin Zorila
30
0
0
24 Apr 2023
Non-autoregressive End-to-end Approaches for Joint Automatic Speech
  Recognition and Spoken Language Understanding
Non-autoregressive End-to-end Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding
Mohan Li
R. Doddipatla
30
6
0
21 Apr 2023
DropDim: A Regularization Method for Transformer Networks
DropDim: A Regularization Method for Transformer Networks
Hao Zhang
Dan Qu
Kejia Shao
Xu Yang
28
12
0
20 Apr 2023
A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at
  Scale
A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale
Cal Peyser
M. Picheny
Kyunghyun Cho
Rohit Prabhavalkar
Ronny Huang
Tara N. Sainath
AI4TS
35
1
0
19 Apr 2023
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming
  Conformer ASR
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR
Xilai Li
Goeric Huybrechts
S. Ronanki
Jeffrey J. Farris
S. Bodapati
38
6
0
18 Apr 2023
A CTC Alignment-based Non-autoregressive Transformer for End-to-end
  Automatic Speech Recognition
A CTC Alignment-based Non-autoregressive Transformer for End-to-end Automatic Speech Recognition
Ruchao Fan
Wei Chu
Peng Chang
Abeer Alwan
18
10
0
15 Apr 2023
Robust and Context-Aware Real-Time Collaborative Robot Handling via
  Dynamic Gesture Commands
Robust and Context-Aware Real-Time Collaborative Robot Handling via Dynamic Gesture Commands
Rui Chen
Alvin C M Shek
Changliu Liu
16
4
0
12 Apr 2023
Online Spatio-Temporal Learning with Target Projection
Online Spatio-Temporal Learning with Target Projection
Thomas Ortner
Lorenzo Pes
Joris Gentinetta
Charlotte Frenkel
A. Pantazi
25
7
0
11 Apr 2023
Sim-T: Simplify the Transformer Network by Multiplexing Technique for
  Speech Recognition
Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition
Guangyong Wei
Zhikui Duan
Shiren Li
Guangguang Yang
Xinmei Yu
Junhua Li
30
4
0
11 Apr 2023
Wav2code: Restore Clean Speech Representations via Codebook Lookup for
  Noise-Robust ASR
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Yuchen Hu
Cheng Chen
Qiu-shi Zhu
E. Chng
22
15
0
11 Apr 2023
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in
  Speech Recognition
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition
Saumya Yashmohini Sahai
Jing Liu
Thejaswi Muniyappa
Kanthashree Mysore Sathyendra
Anastasios Alexandridis
...
Ross McGowan
Ariya Rastrow
Feng-Ju Chang
Athanasios Mouchtaris
Siegfried Kunzmann
39
5
0
03 Apr 2023
Dialog act guided contextual adapter for personalized speech recognition
Dialog act guided contextual adapter for personalized speech recognition
Feng-Ju Chang
Thejaswi Muniyappa
Kanthashree Mysore Sathyendra
Kailin Wei
Grant P. Strimel
Ross McGowan
24
4
0
31 Mar 2023
PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech
  recognition in neural transducers
PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers
R. Pandey
Roger Ren
Qi Luo
Jing Liu
Ariya Rastrow
Ankur Gandhe
Denis Filimonov
Grant P. Strimel
A. Stolcke
I. Bulyko
35
13
0
30 Mar 2023
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot
  AV-ASR
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
29
15
0
29 Mar 2023
Cross-utterance ASR Rescoring with Graph-based Label Propagation
Cross-utterance ASR Rescoring with Graph-based Label Propagation
Srinath Tankasala
Long Chen
A. Stolcke
A. Raju
Qianli Deng
Chander Chandak
Aparna Khare
Roland Maas
Venkatesh Ravichandran
18
0
0
27 Mar 2023
Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for
  Mandarin Speech Recognition
Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition
Kai Liu
Hailiang Xiong
Gangqiang Yang
Zhengfeng Du
Yewen Cao
D. Shah
18
0
0
23 Mar 2023
Previous
12345...192021
Next