ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1508.01211
  4. Cited By
Listen, Attend and Spell
v1v2 (latest)

Listen, Attend and Spell

5 August 2015
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
    RALM
ArXiv (abs)PDFHTML

Papers citing "Listen, Attend and Spell"

50 / 1,041 papers shown
Title
Serialized Speech Information Guidance with Overlapped Encoding
  Separation for Multi-Speaker Automatic Speech Recognition
Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition
Hao Shi
Yuan Gao
Zhaoheng Ni
Tatsuya Kawahara
128
4
0
01 Sep 2024
The State of Commercial Automatic French Legal Speech Recognition
  Systems and their Impact on Court Reporters et al
The State of Commercial Automatic French Legal Speech Recognition Systems and their Impact on Court Reporters et al
Nicolad Garneau
Olivier Bolduc
ELMAILaw
57
1
0
21 Aug 2024
Survey: Transformer-based Models in Data Modality Conversion
Survey: Transformer-based Models in Data Modality Conversion
Elyas Rashno
Amir Eskandari
Aman Anand
F. Zulkernine
MedIm
97
0
0
08 Aug 2024
On the Problem of Text-To-Speech Model Selection for Synthetic Data
  Generation in Automatic Speech Recognition
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
Nick Rossenbach
Ralf Schluter
S. Sakti
77
2
0
31 Jul 2024
On the Effect of Purely Synthetic Training Data for Different Automatic
  Speech Recognition Architectures
On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures
Nick Rossenbach
Benedikt Hilmes
Ralf Schluter
63
2
0
25 Jul 2024
CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based
  Streaming ASR
CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
Wenbo Zhao
Ziwei Li
Chuan Yu
Zhijian Ou
AI4TS
94
2
0
14 Jul 2024
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based
  Speech Recognition
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Ye Bai
Jingping Chen
Jitong Chen
Wei Chen
Zhuo Chen
...
Wanyi Zhang
Yang Zhang
Yawei Zhang
Yijie Zheng
Ming Zou
AuLLM
123
28
0
05 Jul 2024
Serialized Output Training by Learned Dominance
Serialized Output Training by Learned Dominance
Ying Shi
Lantian Li
Shi Yin
D. Wang
Jiqing Han
47
4
0
04 Jul 2024
BESTOW: Efficient and Streamable Speech Language Model with the Best of
  Two Worlds in GPT and T5
BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Zhehuai Chen
He Huang
Oleksii Hrinchuk
Krishna Puvvada
Nithin Rao Koluguri
Piotr Żelasko
Jagadeesh Balam
Boris Ginsburg
AuLLMRALM
92
11
0
28 Jun 2024
MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of
  Transcribed Audio for Speech Recognition Research
MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research
Song Li
Yongbin You
Xuezhi Wang
Zhengkun Tian
Ke Ding
Guanglu Wan
53
3
0
26 Jun 2024
Token-Weighted RNN-T for Learning from Flawed Data
Token-Weighted RNN-T for Learning from Flawed Data
Gil Keren
Wei Zhou
Ozlem Kalinli
93
0
0
26 Jun 2024
Automatic speech recognition for the Nepali language using CNN,
  bidirectional LSTM and ResNet
Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet
Manish Dhakal
Arman Chhetri
Aman Kumar Gupta
Prabin B. Lamichhane
S. Pandey
S. Shakya
AI4TS
63
10
0
25 Jun 2024
InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate
  Predictions
InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions
Yu Nakagome
Michael Hentschel
84
4
0
21 Jun 2024
Instruction Data Generation and Unsupervised Adaptation for Speech
  Language Models
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models
Vahid Noroozi
Zhehuai Chen
Somshubra Majumdar
Steve Huang
Jagadeesh Balam
Boris Ginsburg
SyDa
141
5
0
18 Jun 2024
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
Eungbeom Kim
Hantae Kim
Kyogu Lee
79
2
0
12 Jun 2024
Dual-Pipeline with Low-Rank Adaptation for New Language Integration in
  Multilingual ASR
Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR
Yerbolat Khassanov
Zhipeng Chen
Tianfeng Chen
Tze Yuang Chong
Wei Li
Jun Zhang
Lu Lu
Yuxuan Wang
AI4CE
71
0
0
12 Jun 2024
StreamAtt: Direct Streaming Speech-to-Text Translation with
  Attention-based Audio History Selection
StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection
Sara Papi
Marco Gaido
Matteo Negri
L. Bentivogli
151
8
0
10 Jun 2024
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
Zheshu Song
Jianheng Zhuo
Yifan Yang
Ziyang Ma
Shixiong Zhang
Xie Chen
79
11
0
07 Jun 2024
Unveiling the Dynamics of Information Interplay in Supervised Learning
Unveiling the Dynamics of Information Interplay in Supervised Learning
Kun Song
Zhiquan Tan
Bochao Zou
Huimin Ma
Weiran Huang
74
2
0
06 Jun 2024
Joint Beam Search Integrating CTC, Attention, and Transducer Decoders
Joint Beam Search Integrating CTC, Attention, and Transducer Decoders
Yui Sudo
Muhammad Shakeel
Yosuke Fukumoto
Brian Yan
Jiatong Shi
Yifan Peng
Shinji Watanabe
93
3
0
05 Jun 2024
Joint Optimization of Streaming and Non-Streaming Automatic Speech
  Recognition with Multi-Decoder and Knowledge Distillation
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation
Muhammad Shakeel
Yui Sudo
Yifan Peng
Shinji Watanabe
99
0
0
22 May 2024
Contextualized Automatic Speech Recognition with Dynamic Vocabulary
Contextualized Automatic Speech Recognition with Dynamic Vocabulary
Yui Sudo
Yosuke Fukumoto
Muhammad Shakeel
Yifan Peng
Shinji Watanabe
86
3
0
22 May 2024
Gated Low-rank Adaptation for personalized Code-Switching Automatic
  Speech Recognition on the low-spec devices
Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices
Gwantae Kim
Bokyeung Lee
Donghyeon Kim
Hanseok Ko
OffRL
94
0
0
24 Apr 2024
Transducers with Pronunciation-aware Embeddings for Automatic Speech
  Recognition
Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
Hainan Xu
Zhehuai Chen
Fei Jia
Boris Ginsburg
72
0
0
04 Apr 2024
Effective internal language model training and fusion for factorized
  transducer model
Effective internal language model training and fusion for factorized transducer model
Jinxi Guo
Niko Moritz
Yingyi Ma
Frank Seide
Chunyang Wu
Jay Mahadeokar
Ozlem Kalinli
Christian Fuegen
Michael Seltzer
78
1
0
02 Apr 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques
  and Insights
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
Amirhossein Kazerouni
Ilker Hacihaliloglu
Dorit Merhof
99
7
0
28 Mar 2024
M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual
  Academic Lecture Dataset
M3^33AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Zhe Chen
Heyang Liu
Wenyi Yu
Guangzhi Sun
Hongcheng Liu
Ji Wu
Chao Zhang
Yu Wang
Yanfeng Wang
VGen
98
1
0
21 Mar 2024
Advanced Long-Content Speech Recognition With Factorized Neural
  Transducer
Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Xun Gong
Yu Wu
Jinyu Li
Shujie Liu
Rui Zhao
Xie Chen
Yanmin Qian
111
9
0
20 Mar 2024
Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition
Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition
Wenjing Zhu
Sining Sun
Changhao Shan
Peng Fan
Qing Yang
53
2
0
13 Mar 2024
The evaluation of a code-switched Sepedi-English automatic speech
  recognition system
The evaluation of a code-switched Sepedi-English automatic speech recognition system
Amanda Phaladi
T. Modipa
64
0
0
11 Mar 2024
A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
Ruichen Ma
G. Qiao
Yián Liu
L. Meng
N. Ning
Yang Liu
Shaogang Hu
AAMLMQ
98
3
0
06 Mar 2024
Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Sindhu B. Hegde
Rudrabha Mukhopadhyay
C. V. Jawahar
Vinay P. Namboodiri
51
6
0
02 Mar 2024
Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn
  Medical Interview
Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview
Heyang Liu
Yu Wang
Yanfeng Wang
117
0
0
01 Mar 2024
Extreme Encoder Output Frame Rate Reduction: Improving Computational
  Latencies of Large End-to-End Models
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models
Rohit Prabhavalkar
Zhong Meng
Weiran Wang
Adam Stooke
Xingyu Cai
Yanzhang He
Arun Narayanan
Dongseong Hwang
Tara N. Sainath
Pedro J. Moreno
94
8
0
27 Feb 2024
Alternating Weak Triphone/BPE Alignment Supervision from Hybrid Model
  Improves End-to-End ASR
Alternating Weak Triphone/BPE Alignment Supervision from Hybrid Model Improves End-to-End ASR
Jintao Jiang
Yingbo Gao
Mohammad Zeineldeen
Zoltán Tüske
99
0
0
23 Feb 2024
How do Hyenas deal with Human Speech? Speech Recognition and Translation
  with ConfHyena
How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
100
1
0
20 Feb 2024
Comparison of Conventional Hybrid and CTC/Attention Decoders for
  Continuous Visual Speech Recognition
Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition
David Gimeno-Gómez
Carlos David Martínez Hinarejos
77
1
0
20 Feb 2024
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
Ziyang Ma
Guanrou Yang
Yifan Yang
Zhifu Gao
Jiaming Wang
...
Fan Yu
Qian Chen
Siqi Zheng
Shiliang Zhang
Xie Chen
AuLLM
98
60
0
13 Feb 2024
Self-consistent context aware conformer transducer for speech
  recognition
Self-consistent context aware conformer transducer for speech recognition
Konstantin Kolokolov
Pavel Pekichev
Karthik Raghunathan
60
0
0
09 Feb 2024
Multi-Trigger Backdoor Attacks: More Triggers, More Threats
Multi-Trigger Backdoor Attacks: More Triggers, More Threats
Yige Li
Xingjun Ma
Jiabo He
Hanxun Huang
Yu-Gang Jiang
AAML
90
5
0
27 Jan 2024
Contextualized Automatic Speech Recognition with Attention-Based Bias
  Phrase Boosted Beam Search
Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search
Yui Sudo
Muhammad Shakeel
Yosuke Fukumoto
Yifan Peng
Shinji Watanabe
84
9
0
19 Jan 2024
Improving ASR Contextual Biasing with Guided Attention
Improving ASR Contextual Biasing with Guided Attention
Jiyang Tang
Kwangyoun Kim
Suwon Shon
Felix Wu
Prashant Sridhar
Shinji Watanabe
66
10
0
16 Jan 2024
LCB-net: Long-Context Biasing for Audio-Visual Speech Recognition
LCB-net: Long-Context Biasing for Audio-Visual Speech Recognition
Fan Yu
Haoxu Wang
Xian Shi
Shiliang Zhang
92
4
0
12 Jan 2024
Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
Jiawen Kang
Lingwei Meng
Mingyu Cui
Haohan Guo
Xixin Wu
Xunying Liu
Helen M. Meng
102
7
0
08 Jan 2024
A unified multichannel far-field speech recognition system: combining
  neural beamforming with attention based end-to-end model
A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model
Dongdi Zhao
Jianbo Ma
Lu Lu
Jinke Li
Xuan Ji
Lei Zhu
Fuming Fang
Ming-Yuan Liu
Feijun Jiang
44
1
0
05 Jan 2024
CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based
  Speech Recognition
CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition
Junfeng Hou
Peiyao Wang
Jincheng Zhang
Meng Yang
Minwei Feng
Jingcheng Yin
67
1
0
04 Jan 2024
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition
A. Ogawa
Naohiro Tawara
Takatomo Kano
Marc Delcroix
122
4
0
22 Dec 2023
Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition
Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition
Peng Shen
Xugang Lu
Hisashi Kawai
76
2
0
18 Dec 2023
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices
Mingbin Xu
Alex Jin
Sicheng Wang
Mu Su
Tim Ng
...
Shiyi Han
Zhihong Lei
Yaqiao Deng
Zhen Huang
Mahesh Krishnamoorthy
72
5
0
16 Dec 2023
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech
  Recognition with Universal Speech Models
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Shaojin Ding
David Qiu
David Rim
Yanzhang He
Oleg Rybakov
...
Tara N. Sainath
Zhonglin Han
Jian Li
Amir Yazdanbakhsh
Shivani Agrawal
MQ
111
12
0
13 Dec 2023
Previous
12345...192021
Next