ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.01769
  4. Cited By
State-of-the-art Speech Recognition With Sequence-to-Sequence Models

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

5 December 2017
Chung-Cheng Chiu
Tara N. Sainath
Yonghui Wu
Rohit Prabhavalkar
Patrick Nguyen
Zhehuai Chen
Anjuli Kannan
Ron J. Weiss
Kanishka Rao
Katya Gonina
Navdeep Jaitly
Yue Liu
J. Chorowski
M. Bacchiani
    AI4TS
ArXivPDFHTML

Papers citing "State-of-the-art Speech Recognition With Sequence-to-Sequence Models"

50 / 501 papers shown
Title
Training for Speech Recognition on Coprocessors
Training for Speech Recognition on Coprocessors
Sebastian Baunsgaard
S. Wrede
Pınar Tözün
20
6
0
22 Mar 2020
Deliberation Model Based Two-Pass End-to-End Speech Recognition
Deliberation Model Based Two-Pass End-to-End Speech Recognition
Ke Hu
Tara N. Sainath
Ruoming Pang
Rohit Prabhavalkar
24
85
0
17 Mar 2020
High-Accuracy and Low-Latency Speech Recognition with Two-Head
  Contextual Layer Trajectory LSTM Model
High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model
Jinyu Li
Rui Zhao
Eric Sun
J. H. M. Wong
Amit Das
Zhong Meng
Jiawei Liu
VLM
24
24
0
17 Mar 2020
A Density Ratio Approach to Language Model Fusion in End-To-End
  Automatic Speech Recognition
A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition
Erik McDermott
Hasim Sak
Ehsan Variani
25
112
0
26 Feb 2020
Semi-Supervised Speech Recognition via Local Prior Matching
Semi-Supervised Speech Recognition via Local Prior Matching
Wei-Ning Hsu
Ann Lee
Gabriel Synnaeve
Awni Y. Hannun
SSL
27
31
0
24 Feb 2020
Imputer: Sequence Modelling via Imputation and Dynamic Programming
Imputer: Sequence Modelling via Imputation and Dynamic Programming
William Chan
Chitwan Saharia
Geoffrey E. Hinton
Mohammad Norouzi
Navdeep Jaitly
BDL
AI4TS
21
114
0
20 Feb 2020
Disentangled Speech Embeddings using Cross-modal Self-supervision
Disentangled Speech Embeddings using Cross-modal Self-supervision
Arsha Nagrani
Joon Son Chung
Samuel Albanie
Andrew Zisserman
SSL
21
88
0
20 Feb 2020
Rnn-transducer with language bias for end-to-end Mandarin-English
  code-switching speech recognition
Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition
Shuai Zhang
Jiangyan Yi
Zhengkun Tian
J. Tao
Ye Bai
25
25
0
19 Feb 2020
Low-Rank Bottleneck in Multi-head Attention Models
Low-Rank Bottleneck in Multi-head Attention Models
Srinadh Bhojanapalli
Chulhee Yun
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
24
94
0
17 Feb 2020
Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for
  Ainu Language
Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language
Kohei Matsuura
Sei Ueno
Masato Mimura
S. Sakai
Tatsuya Kawahara
CVBM
18
13
0
16 Feb 2020
Small energy masking for improved neural network training for end-to-end
  speech recognition
Small energy masking for improved neural network training for end-to-end speech recognition
Chanwoo Kim
Kwangyoun Kim
S. Indurthi
24
8
0
15 Feb 2020
Attentional Speech Recognition Models Misbehave on Out-of-domain
  Utterances
Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances
Phillip Keung
Wei Niu
Y. Lu
Julian Salazar
Vikas Bhardwaj
30
9
0
12 Feb 2020
Accelerating RNN Transducer Inference via One-Step Constrained Beam
  Search
Accelerating RNN Transducer Inference via One-Step Constrained Beam Search
Juntae Kim
Yoonhan Lee
20
22
0
10 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized
  fine-grained VAE and auto-regressive prosody prior
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
DiffM
36
92
0
06 Feb 2020
Audio-Visual Decision Fusion for WFST-based and seq2seq Models
Audio-Visual Decision Fusion for WFST-based and seq2seq Models
R. Aralikatti
Sharad Roy
Abhinav Thanda
D. Margam
Pujitha Appan Kandala
Tanay Sharma
S. Venkatesan
19
1
0
29 Jan 2020
Scaling Up Online Speech Recognition Using ConvNets
Scaling Up Online Speech Recognition Using ConvNets
Vineel Pratap
Qiantong Xu
Jacob Kahn
Gilad Avidov
Tatiana Likhomanenko
Awni Y. Hannun
Vitaliy Liptchinsky
Gabriel Synnaeve
R. Collobert
154
38
0
27 Jan 2020
Transformer-based Online CTC/attention End-to-End Speech Recognition
  Architecture
Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture
Haoran Miao
Gaofeng Cheng
Changfeng Gao
Pengyuan Zhang
Yonghong Yan
8
102
0
15 Jan 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network
STAViS: Spatio-Temporal AudioVisual Saliency Network
A. Tsiami
Petros Koutras
Petros Maragos
27
73
0
09 Jan 2020
Domain Adaptation via Teacher-Student Learning for End-to-End Speech
  Recognition
Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition
Zhong Meng
Jinyu Li
Yashesh Gaur
Jiawei Liu
17
50
0
06 Jan 2020
Character-Aware Attention-Based End-to-End Speech Recognition
Character-Aware Attention-Based End-to-End Speech Recognition
Zhong Meng
Yashesh Gaur
Jinyu Li
Jiawei Liu
23
10
0
06 Jan 2020
Exploiting Event Cameras for Spatio-Temporal Prediction of Fast-Changing
  Trajectories
Exploiting Event Cameras for Spatio-Temporal Prediction of Fast-Changing Trajectories
Marco Monforte
A. Arriandiaga
Arren J. Glover
Chiara Bartolozzi
26
10
0
05 Jan 2020
Attention based on-device streaming speech recognition with large speech
  corpus
Attention based on-device streaming speech recognition with large speech corpus
Kwangyoun Kim
Kyungmin Lee
Dhananjaya N. Gowda
Junmo Park
Sungsoo Kim
...
Daehyun Kim
Seokyeong Jung
Jungin Lee
Myoungji Han
Chanwoo Kim
16
58
0
02 Jan 2020
Improved Multi-Stage Training of Online Attention-based Encoder-Decoder
  Models
Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models
Abhinav Garg
Dhananjaya N. Gowda
Ankur Kumar
Kwangyoun Kim
Mehul Kumar
Chanwoo Kim
3DV
14
15
0
28 Dec 2019
power-law nonlinearity with maximally uniform distribution criterion for
  improved neural network training in automatic speech recognition
power-law nonlinearity with maximally uniform distribution criterion for improved neural network training in automatic speech recognition
Chanwoo Kim
Mehul Kumar
Kwangyoun Kim
Dhananjaya N. Gowda
14
9
0
22 Dec 2019
end-to-end training of a large vocabulary end-to-end speech recognition
  system
end-to-end training of a large vocabulary end-to-end speech recognition system
Chanwoo Kim
Sungsoo Kim
Kwangyoun Kim
Mehul Kumar
Jiyeon Kim
...
Eunhyang Kim
Minkyoo Shin
Shatrughan Singh
Larry Heck
Dhananjaya N. Gowda
11
27
0
22 Dec 2019
Application of Word2vec in Phoneme Recognition
Application of Word2vec in Phoneme Recognition
Xin Feng
Lei Wang
12
3
0
17 Dec 2019
Synchronous Speech Recognition and Speech-to-Text Translation with
  Interactive Decoding
Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding
Yuchen Liu
Jiajun Zhang
Hao Xiong
Long Zhou
Zhongjun He
Hua Wu
Haifeng Wang
Chengqing Zong
34
70
0
16 Dec 2019
SpecAugment on Large Scale Datasets
SpecAugment on Large Scale Datasets
Daniel S. Park
Yu Zhang
Chung-Cheng Chiu
Youzheng Chen
Yue Liu
William Chan
Quoc V. Le
Yonghui Wu
25
136
0
11 Dec 2019
Audio-attention discriminative language model for ASR rescoring
Audio-attention discriminative language model for ASR rescoring
Ankur Gandhe
Ariya Rastrow
30
24
0
06 Dec 2019
Integrating Knowledge into End-to-End Speech Recognition from External
  Text-Only Data
Integrating Knowledge into End-to-End Speech Recognition from External Text-Only Data
Ye Bai
Jiangyan Yi
J. Tao
Zhengqi Wen
Zhengkun Tian
Shuai Zhang
14
2
0
04 Dec 2019
Multimodal Machine Translation through Visuals and Speech
Multimodal Machine Translation through Visuals and Speech
U. Sulubacak
Ozan Caglayan
Stig-Arne Gronroos
Aku Rouhe
Desmond Elliott
Lucia Specia
Jörg Tiedemann
56
73
0
28 Nov 2019
Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech
  Recognition
Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition
Chao Weng
Chengzhu Yu
Jia Cui
Chunlei Zhang
Dong Yu
91
39
0
28 Nov 2019
Independent language modeling architecture for end-to-end ASR
Independent language modeling architecture for end-to-end ASR
Van Tung Pham
Haihua Xu
Yerbolat Khassanov
Zhiping Zeng
Chng Eng Siong
Chongjia Ni
B. Ma
Haizhou Li
AuLLM
19
15
0
25 Nov 2019
Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR
  Models
Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models
Zhiyun Lu
Liangliang Cao
Yu Zhang
Chung-Cheng Chiu
James Fan
11
70
0
21 Nov 2019
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded
  Speech
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech
David Harwath
Wei-Ning Hsu
James R. Glass
28
84
0
21 Nov 2019
Seq2Seq RNN based Gait Anomaly Detection from Smartphone Acquired
  Multimodal Motion Data
Seq2Seq RNN based Gait Anomaly Detection from Smartphone Acquired Multimodal Motion Data
Riccardo Bonetto
Mattia Soldan
Alberto Lanaro
Simone Milani
M. Rossi
4
11
0
19 Nov 2019
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern
  Architectures
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
Gabriel Synnaeve
Qiantong Xu
Jacob Kahn
Tatiana Likhomanenko
Edouard Grave
Vineel Pratap
Anuroop Sriram
Vitaliy Liptchinsky
R. Collobert
SSL
AI4TS
36
246
0
19 Nov 2019
Speaker Adaptation for Attention-Based End-to-End Speech Recognition
Speaker Adaptation for Attention-Based End-to-End Speech Recognition
Zhong Meng
Yashesh Gaur
Jinyu Li
Jiawei Liu
26
38
0
09 Nov 2019
A comparison of end-to-end models for long-form speech recognition
A comparison of end-to-end models for long-form speech recognition
Chung-Cheng Chiu
Wei Han
Yu Zhang
Ruoming Pang
S. Kishchenko
...
Anjuli Kannan
Rohit Prabhavalkar
Zhehuai Chen
Tara N. Sainath
Yonghui Wu
AuLLM
36
82
0
06 Nov 2019
Predicting word error rate for reverberant speech
Predicting word error rate for reverberant speech
H. Gamper
Dimitra Emmanouilidou
Sebastian Braun
I. Tashev
13
9
0
01 Nov 2019
Memory Requirement Reduction of Deep Neural Networks Using Low-bit
  Quantization of Parameters
Memory Requirement Reduction of Deep Neural Networks Using Low-bit Quantization of Parameters
Niccoló Nicodemo
Gaurav Naithani
Konstantinos Drossos
Tuomas Virtanen
R. Saletti
MQ
11
1
0
01 Nov 2019
Improving Generalization of Transformer for Speech Recognition with
  Parallel Schedule Sampling and Relative Positional Embedding
Improving Generalization of Transformer for Speech Recognition with Parallel Schedule Sampling and Relative Positional Embedding
Pan Zhou
Ruchao Fan
Wei Chen
Jia Jia
11
26
0
01 Nov 2019
A Dynamically Controlled Recurrent Neural Network for Modeling Dynamical
  Systems
A Dynamically Controlled Recurrent Neural Network for Modeling Dynamical Systems
Yiwei Fu
S. Saab
A. Ray
Michael Hauser
AI4CE
27
8
0
31 Oct 2019
Image-Conditioned Graph Generation for Road Network Extraction
Image-Conditioned Graph Generation for Road Network Extraction
Davide Belli
Thomas Kipf
GNN
24
40
0
31 Oct 2019
Improving sequence-to-sequence speech recognition training with
  on-the-fly data augmentation
Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation
T. Nguyen
S. Stueker
Jan Niehues
A. Waibel
24
98
0
29 Oct 2019
Sequence-to-sequence Automatic Speech Recognition with Word Embedding
  Regularization and Fused Decoding
Sequence-to-sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding
Alexander H. Liu
Tzu-Wei Sung
Shun-Po Chuang
Hung-yi Lee
Lin-Shan Lee
12
13
0
28 Oct 2019
DFSMN-SAN with Persistent Memory Model for Automatic Speech Recognition
DFSMN-SAN with Persistent Memory Model for Automatic Speech Recognition
Zhao You
Dan Su
Jie Chen
Chao Weng
Dong Yu
28
13
0
28 Oct 2019
Mellotron: Multispeaker expressive voice synthesis by conditioning on
  rhythm, pitch and global style tokens
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens
Rafael Valle
Jason Chun Lok Li
R. Prenger
Bryan Catanzaro
25
148
0
26 Oct 2019
Exploring Lexicon-Free Modeling Units for End-to-End Korean and
  Korean-English Code-Switching Speech Recognition
Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition
Jisung Wang
Jihwan Kim
Sangki Kim
Yeha Lee
14
5
0
25 Oct 2019
Towards Online End-to-end Transformer Automatic Speech Recognition
Towards Online End-to-end Transformer Automatic Speech Recognition
E. Tsunoo
Yosuke Kashiwagi
Toshiyuki Kumakura
Shinji Watanabe
22
32
0
25 Oct 2019
Previous
123...10116789
Next