ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.17864
38
22

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

27 October 2023
Jeff Hwang
Moto Hira
Caroline Chen
Xiaohui Zhang
Zhaoheng Ni
Guangzhi Sun
Pingchuan Ma
Ruizhe Huang
Vineel Pratap
Yuekai Zhang
Anurag Kumar
Chin-Yun Yu
Chuang Zhu
Chunxi Liu
Jacob Kahn
Mirco Ravanelli
Peng Sun
Shinji Watanabe
Yangyang Shi
Yumeng Tao
Robin Scheibler
Samuele Cornell
Sean Kim
Stavros Petridis
ArXivPDFHTML
Abstract

TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims to accelerate the research and development of audio and speech technologies by providing well-designed, easy-to-use, and performant PyTorch components. Its contributors routinely engage with users to understand their needs and fulfill them by developing impactful features. Here, we survey TorchAudio's development principles and contents and highlight key features we include in its latest version (2.1): self-supervised learning pre-trained pipelines and training recipes, high-performance CTC decoders, speech recognition models and training recipes, advanced media I/O capabilities, and tools for performing forced alignment, multi-channel speech enhancement, and reference-less speech assessment. For a selection of these features, through empirical studies, we demonstrate their efficacy and show that they achieve competitive or state-of-the-art performance.

View on arXiv
Comments on this paper