Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1508.01211
Cited By
v1
v2 (latest)
Listen, Attend and Spell
5 August 2015
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Listen, Attend and Spell"
50 / 1,041 papers shown
Title
Conversation-oriented ASR with multi-look-ahead CBS architecture
Huaibo Zhao
S. Fujie
Tetsuji Ogawa
Jin Sakuma
Yusuke Kida
Tetsunori Kobayashi
97
3
0
02 Nov 2022
InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss
Yosuke Higuchi
Tetsuji Ogawa
Tetsunori Kobayashi
Shinji Watanabe
78
1
0
02 Nov 2022
TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty
Xingcheng Song
Di Wu
Zhiyong Wu
Binbin Zhang
Yuekai Zhang
Zhendong Peng
Wenpeng Li
Fuping Pan
Changbao Zhu
102
8
0
01 Nov 2022
Speech-text based multi-modal training with bidirectional attention for improved speech recognition
Yuhang Yang
Haihua Xu
Hao-Ming Huang
Eng Siong Chng
Sheng Li
95
7
0
01 Nov 2022
Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition
Suyoun Kim
Ke Li
Lucas Kabela
Rongqing Huang
Jiedan Zhu
Ozlem Kalinli
Duc Le
94
8
0
31 Oct 2022
Structured State Space Decoder for Speech Recognition and Synthesis
Koichi Miyazaki
Masato Murata
Tomoki Koriyama
104
13
0
31 Oct 2022
FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition
Xingcheng Song
Di Wu
Binbin Zhang
Zhiyong Wu
Wenpeng Li
...
Peng Zhang
Zhendong Peng
Fuping Pan
Changbao Zhu
Zhongqin Wu
62
2
0
31 Oct 2022
Modular Hybrid Autoregressive Transducer
Zhong Meng
Tongzhou Chen
Rohit Prabhavalkar
Yu Zhang
Gary Wang
...
Bhuvana Ramabhadran
Wenjie Huang
Ehsan Variani
Yinghui Huang
Pedro J. Moreno
98
23
0
31 Oct 2022
Blank Collapse: Compressing CTC emission for the faster decoding
Minkyu Jung
Ohhyeok Kwon
S. Seo
Soonshin Seo
76
3
0
31 Oct 2022
Partitioned Gradient Matching-based Data Subset Selection for Compute-Efficient Robust ASR Training
Ashish R. Mittal
D. Sivasubramanian
Rishabh K. Iyer
Preethi Jyothi
Ganesh Ramakrishnan
64
4
0
30 Oct 2022
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model
Yosuke Higuchi
Brian Yan
Siddhant Arora
Tetsuji Ogawa
Tetsunori Kobayashi
Shinji Watanabe
122
26
0
29 Oct 2022
Accelerating RNN-T Training and Inference Using CTC guidance
Yongqiang Wang
Zhehuai Chen
Cheng-yong Zheng
Yu Zhang
Wei Han
Parisa Haghani
93
24
0
29 Oct 2022
Filter and evolve: progressive pseudo label refining for semi-supervised automatic speech recognition
Zezhong Jin
Dading Zhong
Xiao Song
Zhaoyi Liu
Naipeng Ye
Qingcheng Zeng
65
2
0
28 Oct 2022
Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition
Yist Y. Lin
Tao Han
Haihua Xu
Van Tung Pham
Yerbolat Khassanov
Tze Yuang Chong
Yi He
Lu Lu
Zejun Ma
74
2
0
28 Oct 2022
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Siddhant Arora
Siddharth Dalmia
Brian Yan
Florian Metze
A. Black
Shinji Watanabe
39
12
0
27 Oct 2022
Monotonic segmental attention for automatic speech recognition
Albert Zeyer
Robin Schmitt
Wei Zhou
Ralf Schluter
Hermann Ney
63
9
0
26 Oct 2022
Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition
Xulong Zhang
Jianzong Wang
Ning Cheng
Mengyuan Zhao
Zhiyong Zhang
Jing Xiao
47
1
0
25 Oct 2022
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
Sanchit Gandhi
Patrick von Platen
Alexander M. Rush
70
25
0
24 Oct 2022
Optimizing Bilingual Neural Transducer with Synthetic Code-switching Text Generation
Thien Nguyen
Nathalie Tran
Liuhui Deng
Thiago Fraga da Silva
Matthew Radzihovsky
...
Honza Silovsky
Arnab Ghoshal
M. Martel
Bharat Ram Ambati
Mohamed Ali
101
5
0
21 Oct 2022
Improving Semi-supervised End-to-end Automatic Speech Recognition using CycleGAN and Inter-domain Losses
C. Li
Ngoc Thang Vu
54
2
0
20 Oct 2022
Anchored Speech Recognition with Neural Transducers
Desh Raj
Junteng Jia
Jay Mahadeokar
Chunyang Wu
Niko Moritz
Xiaohui Zhang
Ozlem Kalinli
62
2
0
20 Oct 2022
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation
Yoshiki Masuyama
Xuankai Chang
Samuele Cornell
Shinji Watanabe
Nobutaka Ono
79
19
0
19 Oct 2022
Helpful Neighbors: Leveraging Neighbors in Geographic Feature Pronunciation
Llion Jones
R. Sproat
Haruko Ishikawa
Alexander Gutkin
70
1
0
18 Oct 2022
Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Ruchao Fan
Guoli Ye
Yashesh Gaur
Jinyu Li
43
4
0
16 Oct 2022
A Policy-based Approach to the SpecAugment Method for Low Resource E2E ASR
Rui Li
Guodong Ma
Dexin Zhao
Ranran Zeng
Xiaoyu Li
Haolin Huang
71
2
0
16 Oct 2022
On Compressing Sequences for Self-Supervised Speech Models
Yen Meng
Hsuan-Jui Chen
Jiatong Shi
Shinji Watanabe
Paola García
Hung-yi Lee
Hao Tang
SSL
58
15
0
13 Oct 2022
An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition
Chao-Han Huck Yang
I-Fan Chen
A. Stolcke
Sabato Marco Siniscalchi
Chin-Hui Lee
73
3
0
11 Oct 2022
CTC Alignments Improve Autoregressive Translation
Brian Yan
Siddharth Dalmia
Yosuke Higuchi
Graham Neubig
Florian Metze
A. Black
Shinji Watanabe
95
33
0
11 Oct 2022
DeepPerform: An Efficient Approach for Performance Testing of Resource-Constrained Neural Networks
Simin Chen
Mirazul Haque
Cong Liu
Wei Yang
110
22
0
10 Oct 2022
JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT
Mayumi Ohta
Julia Kreutzer
Stefan Riezler
65
0
0
05 Oct 2022
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
59
12
0
20 Sep 2022
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models
R. Olivier
H. Abdullah
Bhiksha Raj
AAML
82
1
0
17 Sep 2022
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition
Ye Bai
Jie Li
W. Han
Hao Ni
Kaituo Xu
Zhuo Zhang
Cheng Yi
Xiaorui Wang
MoE
66
2
0
17 Sep 2022
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Kartik Audhkhasi
Yinghui Huang
Bhuvana Ramabhadran
Pedro J. Moreno
64
3
0
13 Sep 2022
Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM
Hayato Futami
Hirofumi Inaguma
Sei Ueno
Masato Mimura
S. Sakai
Tatsuya Kawahara
KELM
133
13
0
08 Sep 2022
Distilling the Knowledge of BERT for CTC-based ASR
Hayato Futami
Hirofumi Inaguma
Masato Mimura
S. Sakai
Tatsuya Kawahara
58
9
0
05 Sep 2022
Vision-Language Adaptive Mutual Decoder for OOV-STR
Jinshui Hu
Chenyu Liu
Qiandong Yan
Xuyang Zhu
Jiajia Wu
Feng Yu
Bing Yin
VLM
108
1
0
02 Sep 2022
Bayesian Neural Network Language Modeling for Speech Recognition
Boyang Xue
Shoukang Hu
Junhao Xu
Mengzhe Geng
Xunying Liu
Helen M. Meng
UQCV
BDL
127
18
0
28 Aug 2022
Interpretable Multimodal Emotion Recognition using Hybrid Fusion of Speech and Image Data
Puneet Kumar
Sarthak Malik
Balasubramanian Raman
CVBM
68
24
0
25 Aug 2022
Comparison and Analysis of New Curriculum Criteria for End-to-End ASR
Georgios Karakasidis
Tamás Grósz
M. Kurimo
43
2
0
10 Aug 2022
ASR Error Correction with Constrained Decoding on Operation Prediction
J. Yang
Rong-Zhi Li
Wei Peng
73
10
0
09 Aug 2022
Adversarial Attacks on ASR Systems: An Overview
Xiao Zhang
Hao Tan
Xuan Huang
Denghui Zhang
Keke Tang
Zhaoquan Gu
AAML
36
3
0
03 Aug 2022
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
Jiatong Shi
G. Saon
David Haws
Shinji Watanabe
Brian Kingsbury
63
3
0
03 Aug 2022
Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition
Peng Shen
Xugang Lu
Hisashi Kawai
44
2
0
29 Jul 2022
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Xiaoming Ren
Huifeng Zhu
Liuwei Wei
Minghui Wu
Jie Hao
113
10
0
24 Jul 2022
Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation
V. Trinh
Pegah Ghahremani
Brian King
J. Droppo
A. Stolcke
Roland Maas
MoMe
45
7
0
16 Jul 2022
PoLyScriber: Integrated Fine-tuning of Extractor and Lyrics Transcriber for Polyphonic Music
Xiaoxue Gao
Chitralekha Gupta
Haizhou Li
94
8
0
15 Jul 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
98
151
0
06 Jul 2022
DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition
Jiamin Xie
John H. L. Hansen
38
1
0
04 Jul 2022
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition
Guangzhi Sun
Chuxu Zhang
P. Woodland
64
14
0
02 Jul 2022
Previous
1
2
3
...
5
6
7
...
19
20
21
Next