Investigating End-to-End ASR Architectures for Long Form Audio
Transcription

Investigating End-to-End ASR Architectures for Long Form Audio Transcription

18 September 2023

Nithin Rao Koluguri

Georgy Zelenfroind

Somshubra Majumdar

Jagadeesh Balam

Boris Ginsburg

Papers citing "Investigating End-to-End ASR Architectures for Long Form Audio Transcription"

8 / 8 papers shown

Title
DOTA-ME-CS: Daily Oriented Text Audio-Mandarin English-Code Switching Dataset Yupei Li Zifan Wei Heng Yu Huichi Zhou Björn Schuller 29 0 0 21 Jan 2025
Reducing the Gap Between Pretrained Speech Enhancement and Recognition Models Using a Real Speech-Trained Bridging Module Zhongjian Cui Chenrui Cui Tianrui Wang Mengnan He Hao Shi Meng Ge Caixia Gong Longbiao Wang J. Dang 33 0 0 05 Jan 2025
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity Mutian He Philip N. Garner 82 0 0 09 Oct 2024
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation Nithin Rao Koluguri Travis M. Bartley Hainan Xu Oleksii Hrinchuk Jagadeesh Balam Boris Ginsburg Georg Kucsko 41 3 0 09 Sep 2024
SONICS: Synthetic Or Not -- Identifying Counterfeit Songs Md Awsafur Rahman Zaber Ibn Abdul Hakim Najibul Haque Sarker Bishmoy Paul S. Fattah 46 7 0 26 Aug 2024
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification Yifan Peng Yui Sudo Muhammad Shakeel Shinji Watanabe VLM 37 17 0 20 Feb 2024
How Much Context Does My Attention-Based ASR System Need? Robert Flynn Anton Ragni 32 1 0 24 Oct 2023
Earnings-21: A Practical Benchmark for ASR in the Wild Miguel Rio Natalie Delworth Ryan Westerman Michelle Huang Nishchal Bhandari Joseph Palakapilly Quinten McNamara Joshua Dong Piotr Żelasko Miguel Jetté 66 47 0 22 Apr 2021