ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.01769
  4. Cited By
State-of-the-art Speech Recognition With Sequence-to-Sequence Models
v1v2v3v4v5v6 (latest)

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

5 December 2017
Chung-Cheng Chiu
Tara N. Sainath
Yonghui Wu
Rohit Prabhavalkar
Patrick Nguyen
Zhiwen Chen
Anjuli Kannan
Ron J. Weiss
Kanishka Rao
Katya Gonina
Navdeep Jaitly
Yue Liu
J. Chorowski
M. Bacchiani
    AI4TS
ArXiv (abs)PDFHTML

Papers citing "State-of-the-art Speech Recognition With Sequence-to-Sequence Models"

50 / 501 papers shown
Title
High Performance Sequence-to-Sequence Model for Streaming Speech
  Recognition
High Performance Sequence-to-Sequence Model for Streaming Speech Recognition
T. Nguyen
Ngoc-Quan Pham
S. Stueker
A. Waibel
42
7
0
22 Mar 2020
Training for Speech Recognition on Coprocessors
Training for Speech Recognition on Coprocessors
Sebastian Baunsgaard
S. Wrede
Pınar Tözün
35
6
0
22 Mar 2020
Deliberation Model Based Two-Pass End-to-End Speech Recognition
Deliberation Model Based Two-Pass End-to-End Speech Recognition
Ke Hu
Tara N. Sainath
Ruoming Pang
Rohit Prabhavalkar
92
87
0
17 Mar 2020
High-Accuracy and Low-Latency Speech Recognition with Two-Head
  Contextual Layer Trajectory LSTM Model
High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model
Jinyu Li
Rui Zhao
Eric Sun
J. H. M. Wong
Amit Das
Zhong Meng
Jiawei Liu
VLM
70
25
0
17 Mar 2020
A Density Ratio Approach to Language Model Fusion in End-To-End
  Automatic Speech Recognition
A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition
Erik McDermott
Hasim Sak
Ehsan Variani
72
113
0
26 Feb 2020
Semi-Supervised Speech Recognition via Local Prior Matching
Semi-Supervised Speech Recognition via Local Prior Matching
Wei-Ning Hsu
Ann Lee
Gabriel Synnaeve
Awni Y. Hannun
SSL
138
31
0
24 Feb 2020
Imputer: Sequence Modelling via Imputation and Dynamic Programming
Imputer: Sequence Modelling via Imputation and Dynamic Programming
William Chan
Chitwan Saharia
Geoffrey E. Hinton
Mohammad Norouzi
Navdeep Jaitly
BDLAI4TS
95
116
0
20 Feb 2020
Disentangled Speech Embeddings using Cross-modal Self-supervision
Disentangled Speech Embeddings using Cross-modal Self-supervision
Arsha Nagrani
Joon Son Chung
Samuel Albanie
Andrew Zisserman
SSL
90
88
0
20 Feb 2020
Rnn-transducer with language bias for end-to-end Mandarin-English
  code-switching speech recognition
Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition
Shuai Zhang
Jiangyan Yi
Zhengkun Tian
J. Tao
Ye Bai
56
27
0
19 Feb 2020
Low-Rank Bottleneck in Multi-head Attention Models
Low-Rank Bottleneck in Multi-head Attention Models
Srinadh Bhojanapalli
Chulhee Yun
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
71
96
0
17 Feb 2020
Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for
  Ainu Language
Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language
Kohei Matsuura
Sei Ueno
Masato Mimura
S. Sakai
Tatsuya Kawahara
CVBM
36
13
0
16 Feb 2020
Small energy masking for improved neural network training for end-to-end
  speech recognition
Small energy masking for improved neural network training for end-to-end speech recognition
Chanwoo Kim
Kwangyoun Kim
S. Indurthi
50
8
0
15 Feb 2020
Attentional Speech Recognition Models Misbehave on Out-of-domain
  Utterances
Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances
Phillip Keung
Wei Niu
Y. Lu
Julian Salazar
Vikas Bhardwaj
72
9
0
12 Feb 2020
Accelerating RNN Transducer Inference via One-Step Constrained Beam
  Search
Accelerating RNN Transducer Inference via One-Step Constrained Beam Search
Juntae Kim
Yoonhan Lee
67
24
0
10 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized
  fine-grained VAE and auto-regressive prosody prior
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
DiffM
98
93
0
06 Feb 2020
Audio-Visual Decision Fusion for WFST-based and seq2seq Models
Audio-Visual Decision Fusion for WFST-based and seq2seq Models
R. Aralikatti
Sharad Roy
Abhinav Thanda
D. Margam
Pujitha Appan Kandala
Tanay Sharma
S. Venkatesan
29
1
0
29 Jan 2020
Scaling Up Online Speech Recognition Using ConvNets
Scaling Up Online Speech Recognition Using ConvNets
Vineel Pratap
Qiantong Xu
Jacob Kahn
Gilad Avidov
Tatiana Likhomanenko
Awni Y. Hannun
Vitaliy Liptchinsky
Gabriel Synnaeve
R. Collobert
242
39
0
27 Jan 2020
Transformer-based Online CTC/attention End-to-End Speech Recognition
  Architecture
Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture
Haoran Miao
Gaofeng Cheng
Changfeng Gao
Pengyuan Zhang
Yonghong Yan
62
104
0
15 Jan 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network
STAViS: Spatio-Temporal AudioVisual Saliency Network
A. Tsiami
Petros Koutras
Petros Maragos
99
73
0
09 Jan 2020
Domain Adaptation via Teacher-Student Learning for End-to-End Speech
  Recognition
Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition
Zhong Meng
Jinyu Li
Yashesh Gaur
Jiawei Liu
85
50
0
06 Jan 2020
Character-Aware Attention-Based End-to-End Speech Recognition
Character-Aware Attention-Based End-to-End Speech Recognition
Zhong Meng
Yashesh Gaur
Jinyu Li
Jiawei Liu
62
10
0
06 Jan 2020
Exploiting Event Cameras for Spatio-Temporal Prediction of Fast-Changing
  Trajectories
Exploiting Event Cameras for Spatio-Temporal Prediction of Fast-Changing Trajectories
Marco Monforte
A. Arriandiaga
Arren J. Glover
Chiara Bartolozzi
48
11
0
05 Jan 2020
Attention based on-device streaming speech recognition with large speech
  corpus
Attention based on-device streaming speech recognition with large speech corpus
Kwangyoun Kim
Kyungmin Lee
Dhananjaya N. Gowda
Junmo Park
Sungsoo Kim
...
Daehyun Kim
Seokyeong Jung
Jungin Lee
Myoungji Han
Chanwoo Kim
55
58
0
02 Jan 2020
Improved Multi-Stage Training of Online Attention-based Encoder-Decoder
  Models
Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models
Abhinav Garg
Dhananjaya N. Gowda
Ankur Kumar
Kwangyoun Kim
Mehul Kumar
Chanwoo Kim
3DV
38
15
0
28 Dec 2019
power-law nonlinearity with maximally uniform distribution criterion for
  improved neural network training in automatic speech recognition
power-law nonlinearity with maximally uniform distribution criterion for improved neural network training in automatic speech recognition
Chanwoo Kim
Mehul Kumar
Kwangyoun Kim
Dhananjaya N. Gowda
58
9
0
22 Dec 2019
end-to-end training of a large vocabulary end-to-end speech recognition
  system
end-to-end training of a large vocabulary end-to-end speech recognition system
Chanwoo Kim
Sungsoo Kim
Kwangyoun Kim
Mehul Kumar
Jiyeon Kim
...
Eunhyang Kim
Minkyoo Shin
Shatrughan Singh
Larry Heck
Dhananjaya N. Gowda
61
27
0
22 Dec 2019
Application of Word2vec in Phoneme Recognition
Application of Word2vec in Phoneme Recognition
Xin Feng
Lei Wang
28
3
0
17 Dec 2019
Synchronous Speech Recognition and Speech-to-Text Translation with
  Interactive Decoding
Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding
Yuchen Liu
Jiajun Zhang
Hao Xiong
Long Zhou
Zhongjun He
Hua Wu
Haifeng Wang
Chengqing Zong
90
71
0
16 Dec 2019
SpecAugment on Large Scale Datasets
SpecAugment on Large Scale Datasets
Daniel S. Park
Yu Zhang
Chung-Cheng Chiu
Youzheng Chen
Yue Liu
William Chan
Quoc V. Le
Yonghui Wu
86
138
0
11 Dec 2019
Audio-attention discriminative language model for ASR rescoring
Audio-attention discriminative language model for ASR rescoring
Ankur Gandhe
Ariya Rastrow
63
24
0
06 Dec 2019
Integrating Knowledge into End-to-End Speech Recognition from External
  Text-Only Data
Integrating Knowledge into End-to-End Speech Recognition from External Text-Only Data
Ye Bai
Jiangyan Yi
J. Tao
Zhengqi Wen
Zhengkun Tian
Shuai Zhang
45
2
0
04 Dec 2019
Multimodal Machine Translation through Visuals and Speech
Multimodal Machine Translation through Visuals and Speech
U. Sulubacak
Ozan Caglayan
Stig-Arne Gronroos
Aku Rouhe
Desmond Elliott
Lucia Specia
Jörg Tiedemann
101
77
0
28 Nov 2019
Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech
  Recognition
Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition
Chao Weng
Chengzhu Yu
Jia Cui
Chunlei Zhang
Dong Yu
146
39
0
28 Nov 2019
Independent language modeling architecture for end-to-end ASR
Independent language modeling architecture for end-to-end ASR
Van Tung Pham
Haihua Xu
Yerbolat Khassanov
Zhiping Zeng
Chng Eng Siong
Chongjia Ni
B. Ma
Haizhou Li
AuLLM
43
15
0
25 Nov 2019
Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR
  Models
Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models
Zhiyun Lu
Liangliang Cao
Yu Zhang
Chung-Cheng Chiu
James Fan
58
72
0
21 Nov 2019
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded
  Speech
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech
David Harwath
Wei-Ning Hsu
James R. Glass
101
85
0
21 Nov 2019
Seq2Seq RNN based Gait Anomaly Detection from Smartphone Acquired
  Multimodal Motion Data
Seq2Seq RNN based Gait Anomaly Detection from Smartphone Acquired Multimodal Motion Data
Riccardo Bonetto
Mattia Soldan
Alberto Lanaro
Simone Milani
M. Rossi
16
11
0
19 Nov 2019
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern
  Architectures
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
Gabriel Synnaeve
Qiantong Xu
Jacob Kahn
Tatiana Likhomanenko
Edouard Grave
Vineel Pratap
Anuroop Sriram
Vitaliy Liptchinsky
R. Collobert
SSLAI4TS
134
248
0
19 Nov 2019
Speaker Adaptation for Attention-Based End-to-End Speech Recognition
Speaker Adaptation for Attention-Based End-to-End Speech Recognition
Zhong Meng
Yashesh Gaur
Jinyu Li
Jiawei Liu
53
38
0
09 Nov 2019
A comparison of end-to-end models for long-form speech recognition
A comparison of end-to-end models for long-form speech recognition
Chung-Cheng Chiu
Wei Han
Yu Zhang
Ruoming Pang
S. Kishchenko
...
Anjuli Kannan
Rohit Prabhavalkar
Zhiwen Chen
Tara N. Sainath
Yonghui Wu
AuLLM
88
83
0
06 Nov 2019
Predicting word error rate for reverberant speech
Predicting word error rate for reverberant speech
H. Gamper
Dimitra Emmanouilidou
Sebastian Braun
I. Tashev
35
9
0
01 Nov 2019
Memory Requirement Reduction of Deep Neural Networks Using Low-bit
  Quantization of Parameters
Memory Requirement Reduction of Deep Neural Networks Using Low-bit Quantization of Parameters
Niccoló Nicodemo
Gaurav Naithani
Konstantinos Drossos
Tuomas Virtanen
R. Saletti
MQ
18
1
0
01 Nov 2019
Improving Generalization of Transformer for Speech Recognition with
  Parallel Schedule Sampling and Relative Positional Embedding
Improving Generalization of Transformer for Speech Recognition with Parallel Schedule Sampling and Relative Positional Embedding
Pan Zhou
Ruchao Fan
Wei Chen
Jia Jia
93
26
0
01 Nov 2019
A Dynamically Controlled Recurrent Neural Network for Modeling Dynamical
  Systems
A Dynamically Controlled Recurrent Neural Network for Modeling Dynamical Systems
Yiwei Fu
S. Saab
A. Ray
Michael Hauser
AI4CE
60
8
0
31 Oct 2019
Image-Conditioned Graph Generation for Road Network Extraction
Image-Conditioned Graph Generation for Road Network Extraction
Davide Belli
Thomas Kipf
GNN
55
40
0
31 Oct 2019
Improving sequence-to-sequence speech recognition training with
  on-the-fly data augmentation
Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation
T. Nguyen
S. Stueker
Jan Niehues
A. Waibel
94
98
0
29 Oct 2019
Sequence-to-sequence Automatic Speech Recognition with Word Embedding
  Regularization and Fused Decoding
Sequence-to-sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding
Alexander H. Liu
Tzu-Wei Sung
Shun-Po Chuang
Hung-yi Lee
Lin-Shan Lee
54
13
0
28 Oct 2019
DFSMN-SAN with Persistent Memory Model for Automatic Speech Recognition
DFSMN-SAN with Persistent Memory Model for Automatic Speech Recognition
Zhao You
Dan Su
Jie Chen
Chao Weng
Dong Yu
89
13
0
28 Oct 2019
Mellotron: Multispeaker expressive voice synthesis by conditioning on
  rhythm, pitch and global style tokens
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens
Rafael Valle
Jason Chun Lok Li
R. Prenger
Bryan Catanzaro
79
149
0
26 Oct 2019
Exploring Lexicon-Free Modeling Units for End-to-End Korean and
  Korean-English Code-Switching Speech Recognition
Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition
Jisung Wang
Jihwan Kim
Sangki Kim
Yeha Lee
48
5
0
25 Oct 2019
Previous
123...10116789
Next