ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.11455
  4. Cited By
Recognizing long-form speech using streaming end-to-end models

Recognizing long-form speech using streaming end-to-end models

24 October 2019
A. Narayanan
Rohit Prabhavalkar
Chung-Cheng Chiu
David Rybach
Tara N. Sainath
Trevor Strohman
ArXivPDFHTML

Papers citing "Recognizing long-form speech using streaming end-to-end models"

50 / 97 papers shown
Title
Efficient and Robust Long-Form Speech Recognition with Hybrid
  H3-Conformer
Efficient and Robust Long-Form Speech Recognition with Hybrid H3-Conformer
Tomoki Honda
S. Sakai
Tatsuya Kawahara
21
0
0
05 Oct 2024
Federated Learning of Large ASR Models in the Real World
Federated Learning of Large ASR Models in the Real World
Yonghui Xiao
Yuxin Ding
Changwan Ryu
P. Zadrazil
Francoise Beaufays
AI4CE
38
0
0
19 Aug 2024
MathWriting: A Dataset For Handwritten Mathematical Expression Recognition
MathWriting: A Dataset For Handwritten Mathematical Expression Recognition
Philippe Gervais
Asya Fadeeva
Andrii Maksai
30
4
0
16 Apr 2024
TransformerFAM: Feedback attention is working memory
TransformerFAM: Feedback attention is working memory
Dongseong Hwang
Weiran Wang
Zhuoyuan Huo
K. Sim
P. M. Mengibar
32
12
0
14 Apr 2024
Advanced Long-Content Speech Recognition With Factorized Neural
  Transducer
Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Xun Gong
Yu Wu
Jinyu Li
Shujie Liu
Rui Zhao
Xie Chen
Yanmin Qian
31
6
0
20 Mar 2024
Improved Long-Form Speech Recognition by Jointly Modeling the Primary
  and Non-primary Speakers
Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers
Guru Prakash Arumugam
Shuo-yiin Chang
Tara N. Sainath
Rohit Prabhavalkar
Quan Wang
Shaan Bijwadia
21
3
0
18 Dec 2023
Partial Rewriting for Multi-Stage ASR
Partial Rewriting for Multi-Stage ASR
A. Bruguier
David Qiu
Yanzhang He
29
0
0
08 Dec 2023
DSS: Synthesizing long Digital Ink using Data augmentation, Style
  encoding and Split generation
DSS: Synthesizing long Digital Ink using Data augmentation, Style encoding and Split generation
A. Timofeev
Anastasiia Fadeeva
A. Afonin
C. Musat
Andrii Maksai
52
2
0
29 Nov 2023
Long-form Simultaneous Speech Translation: Thesis Proposal
Long-form Simultaneous Speech Translation: Thesis Proposal
Peter Polák
3DV
35
3
0
17 Oct 2023
The Gift of Feedback: Improving ASR Model Quality by Learning from User
  Corrections through Federated Learning
The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections through Federated Learning
Lillian Zhou
Yuxin Ding
Mingqing Chen
Harry Zhang
Rohit Prabhavalkar
Dhruv Guliani
Giovanni Motta
Rajiv Mathews
6
1
0
29 Sep 2023
Updated Corpora and Benchmarks for Long-Form Speech Recognition
Updated Corpora and Benchmarks for Long-Form Speech Recognition
Jennifer Drexler Fox
Desh Raj
Natalie Delworth
Quinn Mcnamara
Corey Miller
Miguel Jetté
AuLLM
31
7
0
26 Sep 2023
Memory-augmented conformer for improved end-to-end long-form ASR
Memory-augmented conformer for improved end-to-end long-form ASR
Carlos Carvalho
A. Abad
RALM
30
1
0
22 Sep 2023
Massive End-to-end Models for Short Search Queries
Massive End-to-end Models for Short Search Queries
Weiran Wang
Rohit Prabhavalkar
Dongseong Hwang
Qiujia Li
K. Sim
...
Zhong Meng
CJ Zheng
Yanzhang He
Tara N. Sainath
P. M. Mengibar
32
2
0
22 Sep 2023
Investigating End-to-End ASR Architectures for Long Form Audio
  Transcription
Investigating End-to-End ASR Architectures for Long Form Audio Transcription
Nithin Rao Koluguri
Samuel Kriman
Georgy Zelenfroind
Somshubra Majumdar
Dima Rekesh
Vahid Noroozi
Jagadeesh Balam
Boris Ginsburg
AuLLM
29
9
0
18 Sep 2023
Improving Speech Recognition for African American English With Audio
  Classification
Improving Speech Recognition for African American English With Audio Classification
Shefali Garg
Zhouyuan Huo
K. Sim
Suzan Schwartz
Mason Chua
...
Zion Mengesha
Dongseong Hwang
Tara N. Sainath
Francoise Beaufays
P. M. Mengibar
34
4
0
16 Sep 2023
Chunked Attention-based Encoder-Decoder Model for Streaming Speech
  Recognition
Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition
Mohammad Zeineldeen
Albert Zeyer
Ralf Schluter
Hermann Ney
AuLLM
29
4
0
15 Sep 2023
PromptASR for contextualized ASR with controllable style
PromptASR for contextualized ASR with controllable style
Xiaoyu Yang
Wei Kang
Zengwei Yao
Yifan Yang
Liyong Guo
Fangjun Kuang
Long Lin
Daniel Povey
31
9
0
14 Sep 2023
O-1: Self-training with Oracle and 1-best Hypothesis
O-1: Self-training with Oracle and 1-best Hypothesis
M. Baskar
Andrew Rosenberg
Bhuvana Ramabhadran
Kartik Audhkhasi
VLM
22
0
0
14 Aug 2023
Text Injection for Capitalization and Turn-Taking Prediction in Speech
  Models
Text Injection for Capitalization and Turn-Taking Prediction in Speech Models
Shaan Bijwadia
Shuo-yiin Chang
Weiran Wang
Zhong Meng
Hao Zhang
Tara N. Sainath
24
1
0
14 Aug 2023
Using Text Injection to Improve Recognition of Personal Identifiers in
  Speech
Using Text Injection to Improve Recognition of Personal Identifiers in Speech
Yochai Blau
Rohan Agrawal
Lior Madmony
Gary Wang
Andrew Rosenberg
Zhehuai Chen
Zorik Gekhman
Genady Beryozkin
Parisa Haghani
Bhuvana Ramabhadran
43
3
0
14 Aug 2023
BASS: Block-wise Adaptation for Speech Summarization
BASS: Block-wise Adaptation for Speech Summarization
Roshan S. Sharma
Kenneth Zheng
Siddhant Arora
Shinji Watanabe
Rita Singh
Bhiksha Raj
34
7
0
17 Jul 2023
Accelerating Transducers through Adjacent Token Merging
Accelerating Transducers through Adjacent Token Merging
Yuang Li
Yu-Huan Wu
Jinyu Li
Shujie Liu
22
4
0
28 Jun 2023
Large-scale Language Model Rescoring on Long-form Data
Large-scale Language Model Rescoring on Long-form Data
Tongzhou Chen
Cyril Allauzen
Yinghui Huang
Daniel S. Park
David Rybach
...
Rodrigo Cabrera
Kartik Audhkhasi
Bhuvana Ramabhadran
Pedro J. Moreno
Michael Riley
33
14
0
13 Jun 2023
Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally
  Occurring Spelling Inconsistency
Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally Occurring Spelling Inconsistency
Shigeki Karita
R. Sproat
Haruko Ishikawa
27
4
0
07 Jun 2023
Edit Distance based RL for RNNT decoding
Edit Distance based RL for RNNT decoding
DongSeon Hwang
Changwan Ryu
K. Sim
19
0
0
31 May 2023
Lego-Features: Exporting modular encoder features for streaming and
  deliberation ASR
Lego-Features: Exporting modular encoder features for streaming and deliberation ASR
Rami Botros
Rohit Prabhavalkar
J. Schalkwyk
Ciprian Chelba
Tara N. Sainath
Franccoise Beaufays
AuLLM
20
3
0
31 Mar 2023
Practical Conformer: Optimizing size, speed and flops of Conformer for
  on-Device and cloud ASR
Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR
Rami Botros
Anmol Gulati
Tara N. Sainath
K. Choromanski
Ruoming Pang
Trevor Strohman
Weiran Wang
Jiahui Yu
MQ
20
3
0
31 Mar 2023
A Deliberation-based Joint Acoustic and Text Decoder
A Deliberation-based Joint Acoustic and Text Decoder
S. Mavandadi
Tara N. Sainath
Ke Hu
Zelin Wu
21
7
0
23 Mar 2023
Speech Intelligibility Classifiers from 550k Disordered Speech Samples
Speech Intelligibility Classifiers from 550k Disordered Speech Samples
Subhashini Venugopalan
Jimmy Tobin
Samuel J. Yang
Katie Seaver
Richard Cave
P. Jiang
Neil Zeghidour
Rus Heywood
Jordan R. Green
Michael P. Brenner
29
9
0
13 Mar 2023
JEIT: Joint End-to-End Model and Internal Language Model Training for
  Speech Recognition
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition
Zhong Meng
Weiran Wang
Rohit Prabhavalkar
Tara N. Sainath
Tongzhou Chen
Ehsan Variani
Yu Zhang
Bo-wen Li
Andrew Rosenberg
Bhuvana Ramabhadran
AuLLM
VLM
36
11
0
16 Feb 2023
Efficient Domain Adaptation for Speech Foundation Models
Efficient Domain Adaptation for Speech Foundation Models
Bo-wen Li
DongSeon Hwang
Zhouyuan Huo
Junwen Bai
Guru Prakash
...
K. Sim
Yu Zhang
Wei Han
Trevor Strohman
F. Beaufays
AI4CE
41
23
0
03 Feb 2023
E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model
E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model
Yifan Jiang
Shuo-yiin Chang
Tara N. Sainath
Yanzhang He
David Rybach
R. David
Rohit Prabhavalkar
Cyril Allauzen
Cal Peyser
Trevor Strohman
35
7
0
28 Nov 2022
LongFNT: Long-form Speech Recognition with Factorized Neural Transducer
LongFNT: Long-form Speech Recognition with Factorized Neural Transducer
Xun Gong
Yu-Huan Wu
Jinyu Li
Shujie Liu
Rui Zhao
Xie Chen
Y. Qian
RALM
26
10
0
17 Nov 2022
Modular Hybrid Autoregressive Transducer
Modular Hybrid Autoregressive Transducer
Zhong Meng
Tongzhou Chen
Rohit Prabhavalkar
Yu Zhang
Gary Wang
...
Bhuvana Ramabhadran
Yifan Jiang
Ehsan Variani
Yinghui Huang
Pedro J. Moreno
34
20
0
31 Oct 2022
Random Utterance Concatenation Based Data Augmentation for Improving
  Short-video Speech Recognition
Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition
Yist Y. Lin
Tao Han
Haihua Xu
Van Tung Pham
Yerbolat Khassanov
Tze Yuang Chong
Yi He
Lu Lu
Zejun Ma
13
2
0
28 Oct 2022
Monotonic segmental attention for automatic speech recognition
Monotonic segmental attention for automatic speech recognition
Albert Zeyer
Robin Schmitt
Wei Zhou
Ralf Schluter
Hermann Ney
16
8
0
26 Oct 2022
Smart Speech Segmentation using Acousto-Linguistic Features with
  look-ahead
Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead
Piyush Behre
N. Parihar
S.S. Tan
A. Shah
Eva Sharma
Geoffrey Liu
Shuangyu Chang
H. Khalil
C. Basoglu
S. Pathak
VLM
24
4
0
26 Oct 2022
JOIST: A Joint Speech and Text Streaming Model For ASR
JOIST: A Joint Speech and Text Streaming Model For ASR
Tara N. Sainath
Rohit Prabhavalkar
Ankur Bapna
Yu Zhang
Zhouyuan Huo
Zhehuai Chen
Bo-wen Li
Weiran Wang
Trevor Strohman
RALM
AuLLM
51
35
0
13 Oct 2022
Comparison of Soft and Hard Target RNN-T Distillation for Large-scale
  ASR
Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR
DongSeon Hwang
K. Sim
Yu Zhang
Trevor Strohman
14
10
0
11 Oct 2022
Federated Pruning: Improving Neural Network Efficiency with Federated
  Learning
Federated Pruning: Improving Neural Network Efficiency with Federated Learning
Rongmei Lin
Yonghui Xiao
Tien-Ju Yang
Ding Zhao
Li Xiong
Giovanni Motta
Franccoise Beaufays
FedML
33
12
0
14 Sep 2022
Improving Deliberation by Text-Only and Semi-Supervised Training
Improving Deliberation by Text-Only and Semi-Supervised Training
Ke Hu
Tara N. Sainath
Yanzhang He
Rohit Prabhavalkar
Trevor Strohman
S. Mavandadi
Weiran Wang
26
12
0
29 Jun 2022
Online Model Compression for Federated Learning with Large Models
Online Model Compression for Federated Learning with Large Models
Tien-Ju Yang
Yonghui Xiao
Giovanni Motta
F. Beaufays
Rajiv Mathews
Mingqing Chen
FedML
MQ
43
8
0
06 May 2022
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR
Yifan Jiang
Shuo-yiin Chang
David Rybach
Rohit Prabhavalkar
Tara N. Sainath
Cyril Allauzen
Cal Peyser
Zhiyun Lu
VLM
31
24
0
22 Apr 2022
Streaming Align-Refine for Non-autoregressive Deliberation
Streaming Align-Refine for Non-autoregressive Deliberation
Weiran Wang
Ke Hu
Tara N. Sainath
AI4TS
16
1
0
15 Apr 2022
Improving Rare Word Recognition with LM-aware MWER Training
Improving Rare Word Recognition with LM-aware MWER Training
Weiran Wang
Tongzhou Chen
Tara N. Sainath
Ehsan Variani
Rohit Prabhavalkar
...
S. Mavandadi
Cal Peyser
Trevor Strohman
Yanzhang He
David Rybach
KELM
34
13
0
15 Apr 2022
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes
Shaojin Ding
Weiran Wang
Ding Zhao
Tara N. Sainath
Yanzhang He
...
Qiao Liang
Dongseong Hwang
Ian McGraw
Rohit Prabhavalkar
Trevor Strohman
30
17
0
13 Apr 2022
4-bit Conformer with Native Quantization Aware Training for Speech
  Recognition
4-bit Conformer with Native Quantization Aware Training for Speech Recognition
Shaojin Ding
Phoenix Meadowlark
Yanzhang He
Lukasz Lew
Shivani Agrawal
Oleg Rybakov
MQ
31
32
0
29 Mar 2022
Lahjoita puhetta -- a large-scale corpus of spoken Finnish with some
  benchmarks
Lahjoita puhetta -- a large-scale corpus of spoken Finnish with some benchmarks
Anssi Moisio
Dejan Porjazovski
Aku Rouhe
Yaroslav Getman
A. Virkkunen
Tamás Grósz
Krister Lindén
M. Kurimo
11
21
0
24 Mar 2022
Pseudo Label Is Better Than Human Label
Pseudo Label Is Better Than Human Label
DongSeon Hwang
K. Sim
Zhouyuan Huo
Trevor Strohman
18
32
0
22 Mar 2022
Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word
  Speech Recognition
Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition
Yifan Jiang
Cal Peyser
Tara N. Sainath
Ruoming Pang
Trevor Strohman
Shankar Kumar
18
16
0
09 Mar 2022
12
Next