IBM Research is releasing a new large and diverse dataset called Diversity in Faces (DiF) to advance the study of fairness and accuracy in facial recognition technology.IBM noted in a blog post that the initiative is the first of its kind available to the global research community.DiF provides a dataset of annotations of 1 million human facial images. Using publicly available images from the YFCC-100M Creative Commons data set, we annotated the faces using 10 well-established and independent coding schemes from the scientific literature. [1-10] The coding schemes principally include objective measures of human faces, such as craniofacial features, as well as more subjective annotations, such as human-labeled predictions of age and gender. We believe by extracting and releasing these facial coding scheme annotations on a large dataset of 1 million images of faces, we will accelerate the study of diversity and coverage of data for AI facial recognition systems to ensure more fair and accurate AI systems. “We believe the DiF dataset and its 10 coding schemes offer a jumping-off point for researchers around the globe studying the facial recognition technology. The 10 facial coding methods include craniofacial (e.g., head length, nose length, forehead height), facial ratios (symmetry), visual attributes (age, gender), and pose and resolution, among others. These schemes are some of the strongest identified by the scientific literature, building a solid foundation to our collective knowledge.”
Select Page















