Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.01769
Cited By
v1
v2
v3
v4
v5
v6 (latest)
State-of-the-art Speech Recognition With Sequence-to-Sequence Models
5 December 2017
Chung-Cheng Chiu
Tara N. Sainath
Yonghui Wu
Rohit Prabhavalkar
Patrick Nguyen
Zhiwen Chen
Anjuli Kannan
Ron J. Weiss
Kanishka Rao
Katya Gonina
Navdeep Jaitly
Yue Liu
J. Chorowski
M. Bacchiani
AI4TS
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"State-of-the-art Speech Recognition With Sequence-to-Sequence Models"
50 / 501 papers shown
Title
VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Tianrui Wang
Long Zhou
Zi-Hua Zhang
Yu-Huan Wu
Shujie Liu
Yashesh Gaur
Zhuo Chen
Jinyu Li
Furu Wei
92
106
0
25 May 2023
RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models
David Qiu
David Rim
Shaojin Ding
Oleg Rybakov
Yanzhang He
MQ
77
4
0
24 May 2023
Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding
Tianren Zhang
Haibo Qin
Zhibing Lai
Songlu Chen
Qi Liu
Feng Chen
Xinyuan Qian
Xu-Cheng Yin
56
0
0
23 May 2023
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Yifan Peng
Kwangyoun Kim
Felix Wu
Brian Yan
Siddhant Arora
William Chen
Jiyang Tang
Suwon Shon
Prashant Sridhar
Shinji Watanabe
97
18
0
18 May 2023
Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization
Hamza Kheddar
Yassine Himeur
S. Al-Maadeed
Abbes Amira
F. Bensaali
148
84
0
27 Apr 2023
Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
Mohan Li
R. Doddipatla
Catalin Zorila
147
0
0
24 Apr 2023
Machine Learning Research Trends in Africa: A 30 Years Overview with Bibliometric Analysis Review
A. Ezugwu
O. N. Oyelade
A. M. Ikotun
Jeffery O. Agushaka
Y. Ho
66
17
0
15 Apr 2023
Lego-Features: Exporting modular encoder features for streaming and deliberation ASR
Rami Botros
Rohit Prabhavalkar
J. Schalkwyk
Ciprian Chelba
Tara N. Sainath
Franccoise Beaufays
AuLLM
47
3
0
31 Mar 2023
Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR
Rami Botros
Anmol Gulati
Tara N. Sainath
K. Choromanski
Ruoming Pang
Trevor Strohman
Weiran Wang
Jiahui Yu
MQ
80
3
0
31 Mar 2023
Dialog act guided contextual adapter for personalized speech recognition
Feng-Ju Chang
Thejaswi Muniyappa
Kanthashree Mysore Sathyendra
Kailin Wei
Grant P. Strimel
Ross McGowan
53
5
0
31 Mar 2023
A Deliberation-based Joint Acoustic and Text Decoder
S. Mavandadi
Tara N. Sainath
Ke Hu
Zelin Wu
77
7
0
23 Mar 2023
Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition
Kai Liu
Hailiang Xiong
Gangqiang Yang
Zhengfeng Du
Yewen Cao
D. Shah
139
0
0
23 Mar 2023
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?
Chaoning Zhang
Chenshuang Zhang
Sheng Zheng
Yu Qiao
Chenghao Li
...
Lik-Hang Lee
Yang Yang
Heng Tao Shen
In So Kweon
Choong Seon Hong
186
170
0
21 Mar 2023
Visual Information Matters for ASR Error Correction
Bannihati Kumar Vanya
Shanbo Cheng
Ningxin Peng
Yuchen Zhang
62
3
0
16 Mar 2023
End-to-End Speech Recognition: A Survey
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
82
172
0
03 Mar 2023
Federated Learning for ASR based on Wav2vec 2.0
Tuan Nguyen
Salima Mdhaffar
N. Tomashenko
J. Bonastre
Yannick Esteve
FedML
92
10
0
20 Feb 2023
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition
Zhong Meng
Weiran Wang
Rohit Prabhavalkar
Tara N. Sainath
Tongzhou Chen
Ehsan Variani
Yu Zhang
Yue Liu
Andrew Rosenberg
Bhuvana Ramabhadran
AuLLM
VLM
96
11
0
16 Feb 2023
Characterizing Financial Market Coverage using Artificial Intelligence
Jean Marie Tshimula
D'Jeff K. Nkashama
Patrick Owusu
Marc Frappier
Pierre Martin Tardif
F. Kabanza
Armelle Brun
Jean-Marc Patenaude
Shengrui Wang
Belkacem Chikhaoui
AIFin
35
2
0
07 Feb 2023
Towards Rigorous Understanding of Neural Networks via Semantics-preserving Transformations
Maximilian Schlüter
Gerrit Nolte
Alnis Murtovi
Bernhard Steffen
73
6
0
19 Jan 2023
From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition
Chao-Han Huck Yang
Yue Liu
Yu Zhang
Nanxin Chen
Rohit Prabhavalkar
Tara N. Sainath
Trevor Strohman
65
30
0
19 Jan 2023
Learning Feature Recovery Transformer for Occluded Person Re-identification
Boqiang Xu
Lingxiao He
Jian Liang
Zhenan Sun
ViT
54
54
0
05 Jan 2023
Macro-block dropout for improved regularization in training end-to-end speech recognition models
Chanwoo Kim
Sathish Indurti
Jinhwan Park
Wonyong Sung
56
0
0
29 Dec 2022
Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models
Rui Zhao
Jian Xue
P. Parthasarathy
Veljko Miljanic
Jinyu Li
68
13
0
05 Dec 2022
Probabilistic Verification of ReLU Neural Networks via Characteristic Functions
Joshua Pilipovsky
Vignesh Sivaramakrishnan
Meeko Oishi
Panagiotis Tsiotras
81
5
0
03 Dec 2022
CorrectNet: Robustness Enhancement of Analog In-Memory Computing for Neural Networks by Error Suppression and Compensation
Amro Eldebiky
Grace Li Zhang
G. Böcherer
Bing Li
Ulf Schlichtmann
75
17
0
27 Nov 2022
Why the pseudo label based semi-supervised learning algorithm is effective?
Zeping Min
Qian Ge
Cheng Tai
MLT
71
4
0
18 Nov 2022
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities
Andros Tjandra
Nayan Singhal
David C. Zhang
Ozlem Kalinli
Abdel-rahman Mohamed
Duc Le
M. Seltzer
96
13
0
10 Nov 2022
Understanding the Role of Mixup in Knowledge Distillation: An Empirical Study
Hongjun Choi
Eunyeong Jeon
Ankita Shukla
Pavan Turaga
56
8
0
08 Nov 2022
LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Peidong Wang
Eric Sun
Jian Xue
Yu-Huan Wu
Long Zhou
Yashesh Gaur
Shujie Liu
Jinyu Li
87
10
0
05 Nov 2022
A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability
Jian Xue
Peidong Wang
Jinyu Li
Eric Sun
50
11
0
04 Nov 2022
Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system
Li Li
Dongxing Xu
Haoran Wei
Yanhua Long
100
2
0
03 Nov 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Yonggan Fu
Yang Zhang
Kaizhi Qian
Zhifan Ye
Zhongzhi Yu
Cheng-I Jeff Lai
Yingyan Lin
168
9
0
02 Nov 2022
Internal Language Model Estimation based Adaptive Language Model Fusion for Domain Adaptation
Rao Ma
Xiaobo Wu
Jin Qiu
Yanan Qin
Haihua Xu
Peihao Wu
Zejun Ma
52
2
0
02 Nov 2022
InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss
Yosuke Higuchi
Tetsuji Ogawa
Tetsunori Kobayashi
Shinji Watanabe
74
1
0
02 Nov 2022
Speech-text based multi-modal training with bidirectional attention for improved speech recognition
Yuhang Yang
Haihua Xu
Hao-Ming Huang
Eng Siong Chng
Sheng Li
93
7
0
01 Nov 2022
Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition
Suyoun Kim
Ke Li
Lucas Kabela
Rongqing Huang
Jiedan Zhu
Ozlem Kalinli
Duc Le
78
8
0
31 Oct 2022
Modular Hybrid Autoregressive Transducer
Zhong Meng
Tongzhou Chen
Rohit Prabhavalkar
Yu Zhang
Gary Wang
...
Bhuvana Ramabhadran
Wenjie Huang
Ehsan Variani
Yinghui Huang
Pedro J. Moreno
89
23
0
31 Oct 2022
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model
Yosuke Higuchi
Brian Yan
Siddhant Arora
Tetsuji Ogawa
Tetsunori Kobayashi
Shinji Watanabe
118
26
0
29 Oct 2022
Accelerating RNN-T Training and Inference Using CTC guidance
Yongqiang Wang
Zhehuai Chen
Cheng-yong Zheng
Yu Zhang
Wei Han
Parisa Haghani
86
24
0
29 Oct 2022
Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition
Yist Y. Lin
Tao Han
Haihua Xu
Van Tung Pham
Yerbolat Khassanov
Tze Yuang Chong
Yi He
Lu Lu
Zejun Ma
65
2
0
28 Oct 2022
Towards automatic generation of Piping and Instrumentation Diagrams (P&IDs) with Artificial Intelligence
Edwin Hirtreiter
Lukas Schulze Balhorn
Artur M. Schweidtmann
AI4CE
48
20
0
26 Oct 2022
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation
Yoshiki Masuyama
Xuankai Chang
Samuele Cornell
Shinji Watanabe
Nobutaka Ono
79
19
0
19 Oct 2022
Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Ruchao Fan
Guoli Ye
Yashesh Gaur
Jinyu Li
38
4
0
16 Oct 2022
JOIST: A Joint Speech and Text Streaming Model For ASR
Tara N. Sainath
Rohit Prabhavalkar
Ankur Bapna
Yu Zhang
Zhouyuan Huo
Zhehuai Chen
Yue Liu
Weiran Wang
Trevor Strohman
RALM
AuLLM
87
35
0
13 Oct 2022
Streaming Intended Query Detection using E2E Modeling for Continued Conversation
Shuo-yiin Chang
Guru Prakash
Zelin Wu
Qiao Liang
Tara N. Sainath
Yue Liu
Adam Stambler
Shyam Upadhyay
Manaal Faruqui
Trevor Strohman
69
5
0
29 Aug 2022
Turn-Taking Prediction for Natural Conversational Speech
Shuo-yiin Chang
Yue Liu
Tara N. Sainath
Chaoyang Zhang
Trevor Strohman
Qiao Liang
Yanzhang He
79
21
0
29 Aug 2022
Multimodal Lecture Presentations Dataset: Understanding Multimodality in Educational Slides
Dong Won Lee
Chaitanya Ahuja
Paul Pu Liang
Sanika Natu
Louis-Philippe Morency
139
8
0
17 Aug 2022
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
Jiatong Shi
G. Saon
David Haws
Shinji Watanabe
Brian Kingsbury
56
3
0
03 Aug 2022
Learning Phone Recognition from Unpaired Audio and Phone Sequences Based on Generative Adversarial Network
Da-Rong Liu
Po-Chun Hsu
Yi-Chen Chen
Sung-Feng Huang
Shun-Po Chuang
Da-Yi Wu
Hung-yi Lee
GAN
67
7
0
29 Jul 2022
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Xiaoming Ren
Huifeng Zhu
Liuwei Wei
Minghui Wu
Jie Hao
104
10
0
24 Jul 2022
Previous
1
2
3
4
5
...
9
10
11
Next