ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.22106
29
0

AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion

28 May 2025
Junqi Zhao
Jinzheng Zhao
Haohe Liu
Yun Chen
Lu Han
Xubo Liu
Mark D. Plumbley
Wenwu Wang
    DiffM
ArXiv (abs)PDFHTML
Main:4 Pages
1 Figures
Bibliography:1 Pages
3 Tables
Abstract

Diffusion models have significantly improved the quality and diversity of audio generation but are hindered by slow inference speed. Rectified flow enhances inference speed by learning straight-line ordinary differential equation (ODE) paths. However, this approach requires training a flow-matching model from scratch and tends to perform suboptimally, or even poorly, at low step counts. To address the limitations of rectified flow while leveraging the advantages of advanced pre-trained diffusion models, this study integrates pre-trained models with the rectified diffusion method to improve the efficiency of text-to-audio (TTA) generation. Specifically, we propose AudioTurbo, which learns first-order ODE paths from deterministic noise sample pairs generated by a pre-trained TTA model. Experiments on the AudioCaps dataset demonstrate that our model, with only 10 sampling steps, outperforms prior models and reduces inference to 3 steps compared to a flow-matching-based acceleration model.

View on arXiv
@article{zhao2025_2505.22106,
  title={ AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion },
  author={ Junqi Zhao and Jinzheng Zhao and Haohe Liu and Yun Chen and Lu Han and Xubo Liu and Mark Plumbley and Wenwu Wang },
  journal={arXiv preprint arXiv:2505.22106},
  year={ 2025 }
}
Comments on this paper