FaceID-6M: A Large-Scale, Open-Source FaceID Customization Dataset

10 March 2025

Shuhe Wang

Abstract

Due to the data-driven nature of current face identity (FaceID) customization methods, all state-of-the-art models rely on large-scale datasets containing millions of high-quality text-image pairs for training. However, none of these datasets are publicly available, which restricts transparency and hinders further advancements in the field.

View on arXiv

@article{wang2025_2503.07091,
  title={ FaceID-6M: A Large-Scale, Open-Source FaceID Customization Dataset },
  author={ Shuhe Wang and Xiaoya Li and Jiwei Li and Guoyin Wang and Xiaofei Sun and Bob Zhu and Han Qiu and Mo Yu and Shengjie Shen and Tianwei Zhang and Eduard Hovy },
  journal={arXiv preprint arXiv:2503.07091},
  year={ 2025 }
}

Comments on this paper