Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.02242
Cited By
A comparison of end-to-end models for long-form speech recognition
6 November 2019
Chung-Cheng Chiu
Wei Han
Yu Zhang
Ruoming Pang
S. Kishchenko
Patrick Nguyen
A. Narayanan
H. Liao
Shuyuan Zhang
Anjuli Kannan
Rohit Prabhavalkar
Z. Chen
Tara N. Sainath
Yonghui Wu
AuLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A comparison of end-to-end models for long-form speech recognition"
50 / 59 papers shown
Title
Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio
Xinlu He
Jacob Whitehill
19
0
0
16 May 2025
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
Sreyan Ghosh
Zhifeng Kong
Sonal Kumar
S. Sakshi
Jaehyeon Kim
Ming-Yu Liu
Rafael Valle
Dinesh Manocha
Bryan Catanzaro
MLLM
AuLLM
LRM
59
9
0
06 Mar 2025
Transducer Consistency Regularization for Speech to Text Applications
Cindy Tseng
Yun Tang
Vijendra Raj Apsingekar
40
0
0
09 Oct 2024
Efficient and Robust Long-Form Speech Recognition with Hybrid H3-Conformer
Tomoki Honda
S. Sakai
Tatsuya Kawahara
28
0
0
05 Oct 2024
Lightweight Transducer Based on Frame-Level Criterion
Genshun Wan
Mengzhi Wang
Tingzhi Mao
Hang Chen
Z. Ye
44
1
0
05 Sep 2024
Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Xun Gong
Yu Wu
Jinyu Li
Shujie Liu
Rui Zhao
Xie Chen
Yanmin Qian
37
6
0
20 Mar 2024
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study
Yifan Jiang
Cyril Allauzen
Tongzhou Chen
Kilol Gupta
Ke Hu
James Qin
Yu Zhang
Yongqiang Wang
Shuo-yiin Chang
Tara N. Sainath
MoMe
40
10
0
23 Jan 2024
Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers
Guru Prakash Arumugam
Shuo-yiin Chang
Tara N. Sainath
Rohit Prabhavalkar
Quan Wang
Shaan Bijwadia
29
3
0
18 Dec 2023
Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training
Sean Robertson
Ewan Dunbar
SSL
30
1
0
03 Dec 2023
How Much Context Does My Attention-Based ASR System Need?
Robert Flynn
Anton Ragni
32
1
0
24 Oct 2023
Long-form Simultaneous Speech Translation: Thesis Proposal
Peter Polák
3DV
45
3
0
17 Oct 2023
Updated Corpora and Benchmarks for Long-Form Speech Recognition
Jennifer Drexler Fox
Desh Raj
Natalie Delworth
Quinn Mcnamara
Corey Miller
Miguel Jetté
AuLLM
36
7
0
26 Sep 2023
Memory-augmented conformer for improved end-to-end long-form ASR
Carlos Carvalho
A. Abad
RALM
32
1
0
22 Sep 2023
Investigating End-to-End ASR Architectures for Long Form Audio Transcription
Nithin Rao Koluguri
Samuel Kriman
Georgy Zelenfroind
Somshubra Majumdar
Dima Rekesh
Vahid Noroozi
Jagadeesh Balam
Boris Ginsburg
AuLLM
39
9
0
18 Sep 2023
Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition
Mohammad Zeineldeen
Albert Zeyer
Ralf Schluter
Hermann Ney
AuLLM
29
4
0
15 Sep 2023
Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks
Yun Tang
Anna Y. Sun
Hirofumi Inaguma
Xinyue Chen
Ning Dong
Xutai Ma
Paden Tomasello
J. Pino
48
19
0
04 May 2023
End-to-End Speech Recognition: A Survey
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
26
153
0
03 Mar 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
79
254
0
02 Mar 2023
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Max Bain
Jaesung Huh
Tengda Han
Andrew Zisserman
45
209
0
01 Mar 2023
A Token-Wise Beam Search Algorithm for RNN-T
Gil Keren
31
1
0
28 Feb 2023
Efficient Domain Adaptation for Speech Foundation Models
Bo-wen Li
DongSeon Hwang
Zhouyuan Huo
Junwen Bai
Guru Prakash
...
K. Sim
Yu Zhang
Wei Han
Trevor Strohman
F. Beaufays
AI4CE
44
23
0
03 Feb 2023
E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model
Yifan Jiang
Shuo-yiin Chang
Tara N. Sainath
Yanzhang He
David Rybach
R. David
Rohit Prabhavalkar
Cyril Allauzen
Cal Peyser
Trevor Strohman
43
7
0
28 Nov 2022
LongFNT: Long-form Speech Recognition with Factorized Neural Transducer
Xun Gong
Yu-Huan Wu
Jinyu Li
Shujie Liu
Rui Zhao
Xie Chen
Y. Qian
RALM
32
10
0
17 Nov 2022
Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition
Yist Y. Lin
Tao Han
Haihua Xu
Van Tung Pham
Yerbolat Khassanov
Tze Yuang Chong
Yi He
Lu Lu
Zejun Ma
13
2
0
28 Oct 2022
Monotonic segmental attention for automatic speech recognition
Albert Zeyer
Robin Schmitt
Wei Zhou
Ralf Schluter
Hermann Ney
16
8
0
26 Oct 2022
Learning a Dual-Mode Speech Recognition Model via Self-Pruning
Chunxi Liu
Yuan Shangguan
Haichuan Yang
Yangyang Shi
Raghuraman Krishnamoorthi
Ozlem Kalinli
SSL
29
7
0
25 Jul 2022
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR
Yifan Jiang
Shuo-yiin Chang
David Rybach
Rohit Prabhavalkar
Tara N. Sainath
Cyril Allauzen
Cal Peyser
Zhiyun Lu
VLM
39
24
0
22 Apr 2022
Memory-Efficient Training of RNN-Transducer with Sampled Softmax
Jaesong Lee
Lukas Lee
Shinji Watanabe
30
8
0
31 Mar 2022
Streaming parallel transducer beam search with fast-slow cascaded encoders
Jay Mahadeokar
Yangyang Shi
Ke Li
Duc Le
Jiedan Zhu
Vikas Chandra
Ozlem Kalinli
M. Seltzer
35
15
0
29 Mar 2022
Finnish Parliament ASR corpus - Analysis, benchmarks and statistics
A. Virkkunen
Aku Rouhe
Nhan Phan
M. Kurimo
24
4
0
28 Mar 2022
Lahjoita puhetta -- a large-scale corpus of spoken Finnish with some benchmarks
Anssi Moisio
Dejan Porjazovski
Aku Rouhe
Yaroslav Getman
A. Virkkunen
Tamás Grósz
Krister Lindén
M. Kurimo
19
21
0
24 Mar 2022
VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
Jinhan Wang
Xiaosu Tong
Jinxi Guo
Di He
Roland Maas
29
5
0
22 Feb 2022
Sequence-level self-learning with multiple hypotheses
K. Kumatani
Dimitrios Dimitriadis
Yashesh Gaur
R. Gmyr
Sefik Emre Eskimez
Jinyu Li
Michael Zeng
SSL
25
1
0
10 Dec 2021
Recent Advances in End-to-End Automatic Speech Recognition
Jinyu Li
VLM
35
363
0
02 Nov 2021
Input Length Matters: Improving RNN-T and MWER Training for Long-form Telephony Speech Recognition
Zhiyun Lu
Yanwei Pan
Thibault Doutre
Parisa Haghani
Liangliang Cao
Rohit Prabhavalkar
C. Zhang
Trevor Strohman
AuLLM
83
14
0
08 Oct 2021
Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models
Tianzi Wang
Yuya Fujita
Xuankai Chang
Shinji Watanabe
21
15
0
20 Jul 2021
VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording
Hirofumi Inaguma
Tatsuya Kawahara
19
2
0
15 Jul 2021
Multi-mode Transformer Transducer with Stochastic Future Context
Kwangyoun Kim
Felix Wu
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
30
9
0
17 Jun 2021
Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
Takaaki Hori
Niko Moritz
Chiori Hori
Jonathan Le Roux
30
34
0
19 Apr 2021
Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition
Hirofumi Inaguma
Tatsuya Kawahara
24
13
0
28 Feb 2021
Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker Recordings
Xuankai Chang
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
RALM
13
15
0
06 Jan 2021
AV Taris: Online Audio-Visual Speech Recognition
George Sterpu
N. Harte
27
1
0
14 Dec 2020
A Better and Faster End-to-End Model for Streaming ASR
Bo-wen Li
Anmol Gulati
Jiahui Yu
Tara N. Sainath
Chung-Cheng Chiu
...
Wei Han
Qiao Liang
Yu Zhang
Trevor Strohman
Yonghui Wu
AuLLM
25
123
0
21 Nov 2020
Deep Shallow Fusion for RNN-T Personalization
Duc Le
Gil Keren
Julian Chan
Jay Mahadeokar
Christian Fuegen
M. Seltzer
21
77
0
16 Nov 2020
Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR
Xiaohui Zhang
Frank Zhang
Chunxi Liu
Kjell Schubert
Julian Chan
...
Jun Liu
Ching-Feng Yeh
Fuchun Peng
Yatharth Saraf
Geoffrey Zweig
19
20
0
09 Nov 2020
Dual Application of Speech Enhancement for Automatic Speech Recognition
Ashutosh Pandey
Chunxi Liu
Yun Wang
Yatharth Saraf
41
37
0
07 Nov 2020
Improving RNN Transducer Based ASR with Auxiliary Tasks
Chunxi Liu
Frank Zhang
Duc Le
Suyoun Kim
Yatharth Saraf
Geoffrey Zweig
26
49
0
05 Nov 2020
Alignment Restricted Streaming Recurrent Neural Network Transducer
Jay Mahadeokar
Yuan Shangguan
Duc Le
Gil Keren
Hang Su
Thong Le
Ching-Feng Yeh
Christian Fuegen
M. Seltzer
AI4TS
25
63
0
05 Nov 2020
Improving RNN transducer with normalized jointer network
Mingkun Huang
Jun Zhang
Meng Cai
Yang Zhang
Jiali Yao
Yongbin You
Yi He
Zejun Ma
25
7
0
03 Nov 2020
Cascaded encoders for unifying streaming and non-streaming ASR
A. Narayanan
Tara N. Sainath
Ruoming Pang
Jiahui Yu
Chung-Cheng Chiu
Rohit Prabhavalkar
Ehsan Variani
Trevor Strohman
AuLLM
8
85
0
27 Oct 2020
1
2
Next