ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.12669
12
0

Text2midi-InferAlign: Improving Symbolic Music Generation with Inference-Time Alignment

19 May 2025
Abhinaba Roy
Geeta Puri
Dorien Herremans
ArXivPDFHTML
Abstract

We present Text2midi-InferAlign, a novel technique for improving symbolic music generation at inference time. Our method leverages text-to-audio alignment and music structural alignment rewards during inference to encourage the generated music to be consistent with the input caption. Specifically, we introduce two objectives scores: a text-audio consistency score that measures rhythmic alignment between the generated music and the original text caption, and a harmonic consistency score that penalizes generated music containing notes inconsistent with the key. By optimizing these alignment-based objectives during the generation process, our model produces symbolic music that is more closely tied to the input captions, thereby improving the overall quality and coherence of the generated compositions. Our approach can extend any existing autoregressive model without requiring further training or fine-tuning. We evaluate our work on top of Text2midi - an existing text-to-midi generation model, demonstrating significant improvements in both objective and subjective evaluation metrics.

View on arXiv
@article{roy2025_2505.12669,
  title={ Text2midi-InferAlign: Improving Symbolic Music Generation with Inference-Time Alignment },
  author={ Abhinaba Roy and Geeta Puri and Dorien Herremans },
  journal={arXiv preprint arXiv:2505.12669},
  year={ 2025 }
}
Comments on this paper