We propose MelodySim, a melody-aware music similarity model and dataset for plagiarism detection. First, we introduce a novel method to construct a dataset with focus on melodic similarity. By augmenting Slakh2100; an existing MIDI dataset, we generate variations of each piece while preserving the melody through modifications such as note splitting, arpeggiation, minor track dropout (excluding bass), and re-instrumentation. A user study confirms that positive pairs indeed contain similar melodies, with other musical tracks significantly changed. Second, we develop a segment-wise melodic-similarity detection model that uses a MERT encoder and applies a triplet neural network to capture melodic similarity. The resultant decision matrix highlights where plagiarism might occur. Our model achieves high accuracy on the MelodySim test set.
View on arXiv@article{lu2025_2505.20979, title={ MelodySim: Measuring Melody-aware Music Similarity for Plagiarism Detection }, author={ Tongyu Lu and Charlotta-Marlena Geist and Jan Melechovsky and Abhinaba Roy and Dorien Herremans }, journal={arXiv preprint arXiv:2505.20979}, year={ 2025 } }