ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1508.01211
  4. Cited By
Listen, Attend and Spell

Listen, Attend and Spell

5 August 2015
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
    RALM
ArXivPDFHTML

Papers citing "Listen, Attend and Spell"

50 / 1,033 papers shown
Title
Serialized Output Training by Learned Dominance
Serialized Output Training by Learned Dominance
Ying Shi
Lantian Li
Shi Yin
D. Wang
Jiqing Han
19
4
0
04 Jul 2024
BESTOW: Efficient and Streamable Speech Language Model with the Best of
  Two Worlds in GPT and T5
BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Zhehuai Chen
He Huang
Oleksii Hrinchuk
Krishna C. Puvvada
Nithin Rao Koluguri
Piotr Żelasko
Jagadeesh Balam
Boris Ginsburg
AuLLM
RALM
38
10
0
28 Jun 2024
MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of
  Transcribed Audio for Speech Recognition Research
MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research
Song Li
Yongbin You
Xuezhi Wang
Zhengkun Tian
Ke Ding
Guanglu Wan
29
1
0
26 Jun 2024
Token-Weighted RNN-T for Learning from Flawed Data
Token-Weighted RNN-T for Learning from Flawed Data
Gil Keren
Wei Zhou
Ozlem Kalinli
43
0
0
26 Jun 2024
Automatic speech recognition for the Nepali language using CNN,
  bidirectional LSTM and ResNet
Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet
Manish Dhakal
Arman Chhetri
Aman Kumar Gupta
Prabin B. Lamichhane
S. Pandey
S. Shakya
AI4TS
35
10
0
25 Jun 2024
InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate
  Predictions
InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions
Yu Nakagome
Michael Hentschel
47
0
0
21 Jun 2024
Instruction Data Generation and Unsupervised Adaptation for Speech
  Language Models
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models
Vahid Noroozi
Zhehuai Chen
Somshubra Majumdar
Steve Huang
Jagadeesh Balam
Boris Ginsburg
SyDa
41
3
0
18 Jun 2024
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
Eungbeom Kim
Hantae Kim
Kyogu Lee
32
1
0
12 Jun 2024
Dual-Pipeline with Low-Rank Adaptation for New Language Integration in
  Multilingual ASR
Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR
Yerbolat Khassanov
Zhipeng Chen
Tianfeng Chen
Tze Yuang Chong
Wei Li
Jun Zhang
Lu Lu
Yuxuan Wang
AI4CE
21
0
0
12 Jun 2024
StreamAtt: Direct Streaming Speech-to-Text Translation with
  Attention-based Audio History Selection
StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection
Sara Papi
Marco Gaido
Matteo Negri
L. Bentivogli
64
4
0
10 Jun 2024
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
Zheshu Song
Jianheng Zhuo
Yifan Yang
Ziyang Ma
Shixiong Zhang
Xie Chen
36
9
0
07 Jun 2024
Unveiling the Dynamics of Information Interplay in Supervised Learning
Unveiling the Dynamics of Information Interplay in Supervised Learning
Kun Song
Zhiquan Tan
Bochao Zou
Huimin Ma
Weiran Huang
38
1
0
06 Jun 2024
Joint Beam Search Integrating CTC, Attention, and Transducer Decoders
Joint Beam Search Integrating CTC, Attention, and Transducer Decoders
Yui Sudo
Muhammad Shakeel
Yosuke Fukumoto
Brian Yan
Jiatong Shi
Yifan Peng
Shinji Watanabe
27
0
0
05 Jun 2024
Joint Optimization of Streaming and Non-Streaming Automatic Speech
  Recognition with Multi-Decoder and Knowledge Distillation
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation
Muhammad Shakeel
Yui Sudo
Yifan Peng
Shinji Watanabe
35
0
0
22 May 2024
Contextualized Automatic Speech Recognition with Dynamic Vocabulary
Contextualized Automatic Speech Recognition with Dynamic Vocabulary
Yui Sudo
Yosuke Fukumoto
Muhammad Shakeel
Yifan Peng
Shinji Watanabe
29
0
0
22 May 2024
Gated Low-rank Adaptation for personalized Code-Switching Automatic
  Speech Recognition on the low-spec devices
Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices
Gwantae Kim
Bokyeung Lee
Donghyeon Kim
Hanseok Ko
OffRL
25
0
0
24 Apr 2024
Transducers with Pronunciation-aware Embeddings for Automatic Speech
  Recognition
Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
Hainan Xu
Zhehuai Chen
Fei Jia
Boris Ginsburg
41
0
0
04 Apr 2024
Effective internal language model training and fusion for factorized
  transducer model
Effective internal language model training and fusion for factorized transducer model
Jinxi Guo
Niko Moritz
Yingyi Ma
Frank Seide
Chunyang Wu
Jay Mahadeokar
Ozlem Kalinli
Christian Fuegen
Michael Seltzer
43
1
0
02 Apr 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques
  and Insights
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
A. Kazerouni
I. Hacihaliloglu
Dorit Merhof
51
7
0
28 Mar 2024
M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual
  Academic Lecture Dataset
M3^33AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Zhe Chen
Heyang Liu
Wenyi Yu
Guangzhi Sun
Hongcheng Liu
Ji Wu
Chao Zhang
Yu Wang
Yanfeng Wang
VGen
57
1
0
21 Mar 2024
Advanced Long-Content Speech Recognition With Factorized Neural
  Transducer
Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Xun Gong
Yu Wu
Jinyu Li
Shujie Liu
Rui Zhao
Xie Chen
Yanmin Qian
37
6
0
20 Mar 2024
Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition
Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition
Wenjing Zhu
Sining Sun
Changhao Shan
Peng Fan
Qing Yang
37
1
0
13 Mar 2024
The evaluation of a code-switched Sepedi-English automatic speech
  recognition system
The evaluation of a code-switched Sepedi-English automatic speech recognition system
Amanda Phaladi
T. Modipa
38
0
0
11 Mar 2024
A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
Ruichen Ma
G. Qiao
Yián Liu
L. Meng
N. Ning
Yang Liu
Shaogang Hu
AAML
MQ
42
3
0
06 Mar 2024
Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Sindhu B. Hegde
Rudrabha Mukhopadhyay
C. V. Jawahar
Vinay P. Namboodiri
27
4
0
02 Mar 2024
Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn
  Medical Interview
Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview
Heyang Liu
Yu Wang
Yanfeng Wang
46
0
0
01 Mar 2024
Extreme Encoder Output Frame Rate Reduction: Improving Computational
  Latencies of Large End-to-End Models
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models
Rohit Prabhavalkar
Zhong Meng
Weiran Wang
Adam Stooke
Xingyu Cai
Yanzhang He
Arun Narayanan
Dongseong Hwang
Tara N. Sainath
Pedro J. Moreno
30
8
0
27 Feb 2024
Alternating Weak Triphone/BPE Alignment Supervision from Hybrid Model
  Improves End-to-End ASR
Alternating Weak Triphone/BPE Alignment Supervision from Hybrid Model Improves End-to-End ASR
Jintao Jiang
Yingbo Gao
Mohammad Zeineldeen
Zoltán Tüske
34
0
0
23 Feb 2024
How do Hyenas deal with Human Speech? Speech Recognition and Translation
  with ConfHyena
How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
43
1
0
20 Feb 2024
Comparison of Conventional Hybrid and CTC/Attention Decoders for
  Continuous Visual Speech Recognition
Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition
David Gimeno-Gómez
Carlos David Martínez Hinarejos
32
1
0
20 Feb 2024
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
Ziyang Ma
Guanrou Yang
Yifan Yang
Zhifu Gao
Jiaming Wang
...
Fan Yu
Qian Chen
Siqi Zheng
Shiliang Zhang
Xie Chen
AuLLM
47
39
0
13 Feb 2024
Self-consistent context aware conformer transducer for speech
  recognition
Self-consistent context aware conformer transducer for speech recognition
Konstantin Kolokolov
Pavel Pekichev
Karthik Raghunathan
22
0
0
09 Feb 2024
Multi-Trigger Backdoor Attacks: More Triggers, More Threats
Multi-Trigger Backdoor Attacks: More Triggers, More Threats
Yige Li
Xingjun Ma
Jiabo He
Hanxun Huang
Yu-Gang Jiang
AAML
32
3
0
27 Jan 2024
Contextualized Automatic Speech Recognition with Attention-Based Bias
  Phrase Boosted Beam Search
Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search
Yui Sudo
Muhammad Shakeel
Yosuke Fukumoto
Yifan Peng
Shinji Watanabe
28
5
0
19 Jan 2024
Improving ASR Contextual Biasing with Guided Attention
Improving ASR Contextual Biasing with Guided Attention
Jiyang Tang
Kwangyoun Kim
Suwon Shon
Felix Wu
Prashant Sridhar
Shinji Watanabe
29
8
0
16 Jan 2024
LCB-net: Long-Context Biasing for Audio-Visual Speech Recognition
LCB-net: Long-Context Biasing for Audio-Visual Speech Recognition
Fan Yu
Haoxu Wang
Xian Shi
Shiliang Zhang
27
3
0
12 Jan 2024
Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
Jiawen Kang
Lingwei Meng
Mingyu Cui
Haohan Guo
Xixin Wu
Xunying Liu
Helen M. Meng
59
5
0
08 Jan 2024
A unified multichannel far-field speech recognition system: combining
  neural beamforming with attention based end-to-end model
A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model
Dongdi Zhao
Jianbo Ma
Lu Lu
Jinke Li
Xuan Ji
Lei Zhu
Fuming Fang
Ming-Yu Liu
Feijun Jiang
15
1
0
05 Jan 2024
CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based
  Speech Recognition
CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition
Junfeng Hou
Peiyao Wang
Jincheng Zhang
Meng-Da Yang
Minwei Feng
Jingcheng Yin
29
1
0
04 Jan 2024
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition
A. Ogawa
Naohiro Tawara
Takatomo Kano
Marc Delcroix
46
4
0
22 Dec 2023
Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition
Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition
Peng Shen
Xugang Lu
Hisashi Kawai
29
1
0
18 Dec 2023
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices
Mingbin Xu
Alex Jin
Sicheng Wang
Mu Su
Tim Ng
...
Shiyi Han
Zhihong Lei
Yaqiao Deng
Zhen Huang
Mahesh Krishnamoorthy
24
4
0
16 Dec 2023
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech
  Recognition with Universal Speech Models
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Shaojin Ding
David Qiu
David Rim
Yanzhang He
Oleg Rybakov
...
Tara N. Sainath
Zhonglin Han
Jian Li
Amir Yazdanbakhsh
Shivani Agrawal
MQ
31
9
0
13 Dec 2023
D4AM: A General Denoising Framework for Downstream Acoustic Models
D4AM: A General Denoising Framework for Downstream Acoustic Models
H. Wang
Yu Tsao
Hsin-Min Wang
Chu-Song Chen
18
4
0
28 Nov 2023
Weak Alignment Supervision from Hybrid Model Improves End-to-end ASR
Weak Alignment Supervision from Hybrid Model Improves End-to-end ASR
Jintao Jiang
Yingbo Gao
Zoltán Tüske
21
1
0
24 Nov 2023
Analysis of Visual Features for Continuous Lipreading in Spanish
Analysis of Visual Features for Continuous Lipreading in Spanish
David Gimeno-Gómez
Carlos David Martínez Hinarejos
45
2
0
21 Nov 2023
LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild
LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild
David Gimeno-Gómez
Carlos David Martínez Hinarejos
16
8
0
21 Nov 2023
Phonological Level wav2vec2-based Mispronunciation Detection and
  Diagnosis Method
Phonological Level wav2vec2-based Mispronunciation Detection and Diagnosis Method
M. Shahin
Julien Epps
Beena Ahmed
16
1
0
13 Nov 2023
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech
  Translation
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation
Juan Pablo Zuluaga
Zhaocheng Huang
Xing Niu
Rohit Paturi
S. Srinivasan
Prashant Mathur
Brian Thompson
Marcello Federico
BDL
35
2
0
01 Nov 2023
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo
  Labelling
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Sanchit Gandhi
Patrick von Platen
Alexander M. Rush
VLM
27
52
0
01 Nov 2023
Previous
12345...192021
Next