Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification

Main:2 Pages
6 Figures
1 Tables
Appendix:10 Pages
Abstract
As Large Language Models (LLMs) become integral software components in modern applications, unauthorized model derivations through fine-tuning, merging, and redistribution have emerged as critical software engineering challenges. Unlike traditional software where clone detection and license compliance are well-established, the LLM ecosystem lacks effective mechanisms to detect model lineage and enforce licensing agreements. This gap is particularly problematic when open-source model creators, such as Meta's LLaMA, require derivative works to maintain naming conventions for attribution, yet no technical means exist to verify compliance.
View on arXiv@article{wu2025_2506.01631, title={ Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification }, author={ Zehao Wu and Yanjie Zhao and Haoyu Wang }, journal={arXiv preprint arXiv:2506.01631}, year={ 2025 } }
Comments on this paper