Given a set of deep learning models, it can be hard to find models appropriate to a task, understand the models, and characterize how models are different one from another. Currently, practitioners rely on manually-written documentation to understand and choose models. However, not all models have complete and reliable documentation. As the number of models increases, the challenges of finding, differentiating, and understanding models become increasingly crucial. Inspired from research on data lakes, we introduce the concept of model lakes. We formalize key model lake tasks, including model attribution, versioning, search, and benchmarking, and discuss fundamental research challenges in the management of large models. We also explore what data management techniques can be brought to bear on the study of large model management.
View on arXiv@article{pal2025_2403.02327, title={ Model Lakes }, author={ Koyena Pal and David Bau and Renée J. Miller }, journal={arXiv preprint arXiv:2403.02327}, year={ 2025 } }