Due to the data-driven nature of current face identity (FaceID) customization methods, all state-of-the-art models rely on large-scale datasets containing millions of high-quality text-image pairs for training. However, none of these datasets are publicly available, which restricts transparency and hinders further advancements in the field.
View on arXiv@article{wang2025_2503.07091, title={ FaceID-6M: A Large-Scale, Open-Source FaceID Customization Dataset }, author={ Shuhe Wang and Xiaoya Li and Jiwei Li and Guoyin Wang and Xiaofei Sun and Bob Zhu and Han Qiu and Mo Yu and Shengjie Shen and Tianwei Zhang and Eduard Hovy }, journal={arXiv preprint arXiv:2503.07091}, year={ 2025 } }