ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.17864
  4. Cited By
TorchAudio 2.1: Advancing speech recognition, self-supervised learning,
  and audio processing components for PyTorch

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

27 October 2023
Jeff Hwang
Moto Hira
Caroline Chen
Xiaohui Zhang
Zhaoheng Ni
Guangzhi Sun
Pingchuan Ma
Ruizhe Huang
Vineel Pratap
Yuekai Zhang
Anurag Kumar
Chin-Yun Yu
Chuang Zhu
Chunxi Liu
Jacob Kahn
Mirco Ravanelli
Peng Sun
Shinji Watanabe
Yangyang Shi
Yumeng Tao
Robin Scheibler
Samuele Cornell
Sean Kim
Stavros Petridis
ArXivPDFHTML

Papers citing "TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch"

7 / 7 papers shown
Title
Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors
Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors
J. Hauret
Malo Olivier
Thomas Joubaud
C. Langrenne
Sarah Poirée
V. Zimpfer
Éric Bavu
75
1
0
16 Jul 2024
FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Yuanjun Lv
Hai Li
Ying Yan
Junhui Liu
Danming Xie
Lei Xie
48
1
0
12 Jun 2024
Scaling Speech Technology to 1,000+ Languages
Scaling Speech Technology to 1,000+ Languages
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
77
300
0
22 May 2023
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
  for Single and Multi-Person Video
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
89
40
0
25 Jan 2022
Emformer: Efficient Memory Transformer Based Acoustic Model For Low
  Latency Streaming Speech Recognition
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition
Yangyang Shi
Yongqiang Wang
Chunyang Wu
Ching-Feng Yeh
Julian Chan
Frank Zhang
Duc Le
M. Seltzer
56
168
0
21 Oct 2020
NeMo: a toolkit for building AI applications using Neural Modules
NeMo: a toolkit for building AI applications using Neural Modules
Oleksii Kuchaiev
Jason Chun Lok Li
Huyen Nguyen
Oleksii Hrinchuk
Ryan Leary
...
Jack Cook
P. Castonguay
Mariya Popova
Jocelyn Huang
Jonathan M. Cohen
202
292
0
14 Sep 2019
VoxCeleb2: Deep Speaker Recognition
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
224
2,234
0
14 Jun 2018
1