Recent work has proposed training machine learning models to predict aesthetic ratings for music audio. Our work explores whether such models can be used to finetune a symbolic music generation system with reinforcement learning, and what effect this has on the system outputs. To test this, we use group relative policy optimization to finetune a piano MIDI model with Meta Audiobox Aesthetics ratings of audio-rendered outputs as the reward. We find that this optimization has effects on multiple low-level features of the generated outputs, and improves the average subjective ratings in a preliminary listening study with participants. We also find that over-optimization dramatically reduces diversity of model outputs.
View on arXiv@article{jonason2025_2504.16839, title={ SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward }, author={ Nicolas Jonason and Luca Casini and Bob L. T. Sturm }, journal={arXiv preprint arXiv:2504.16839}, year={ 2025 } }