ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.05600
45
5
v1v2v3 (latest)

Fast offline Transformer-based end-to-end automatic speech recognition for real-world applications

14 January 2021
Y. Oh
Kiyoung Park
Jeongue Park
    OffRL
ArXiv (abs)PDFHTML
Abstract

Many real-world applications require to convert speech files into text with high accuracy with limited resources. This paper proposes a method to recognize large speech database fast using the Transformer-based end-to-end model. Transfomers have improved the state-of-the-art performance in many fields as well as speech recognition. But it is not easy to be used for long sequences. In this paper, various techniques to speed up the recognition of real-world speeches are proposed and tested including parallelizing the recognition using batched beam search, detecting end-of-speech based on connectionist temporal classification (CTC), restricting CTC prefix score and splitting long speeches into short segments. Experiments are conducted with real-world Korean speech recognition task. Experimental results with an 8-hour test corpus show that the proposed system can convert speeches into text in less than 3 minutes with 10.73% character error rate which is 27.1% relatively low compared to conventional DNN-HMM based recognition system.

View on arXiv
Comments on this paper