Analyzing and Reducing the Damage of Dataset Bias

Analyzing and Reducing the Damage of Dataset Bias to Face Recognition with Synthetic Data Adam Kortylewski, Bernhard Egger, Andreas Schneider, Thomas Gerig, Andreas Morel-Forster, Thomas Vetter

Success of AI at Face Recognition • FR works well (in rather controlled environments): Phones Airports ATMs • FR needs to be improved in less controlled environments: • Surveillance, disguised faces, … 2

Deep Face Recognition Systems Suffer from Dataset Bias • Deep face recognition systems do not generalize well (to previously unseen views, partial occlusion, context, …) • FR systems need to generalize to out-of-distribution samples • In this work, we aim to: 1. Quantify the damage from dataset bias 2. Undo the damage from dataset bias 3

Difficulty of quantifying the damage of dataset bias • Scalar perfromance measure provides only limited information about the generalization ability of a CNN architecture • Cannot systematically analyze „weak spots“ • We need to study generalization performance as a function of nuisance variables 4

Synthetic Face Image Generation • We use the 3 DMM and computer graphics to synthesize face images • https: //github. com/unibas-gravis/parametric-face-image-generator Blanz and Vetter (1999). A Morphable Model for the Synthesis of 3 D Faces, SIGGRAPH Paysan, Knothe, Amberg, Romdhani and Vetter (2009) A 3 D Face Model for Pose and Illumination Invariant Face Recognition, AVSS 5

Quantifying the Damage of Biases in the Pose Distribution • VGG-16 generalizes better than Alex. Net to unseen face poses. Kortylewski, et al. "Empirically analyzing the effect of dataset biases on deep face recognition systems. “, 2018. 6

Pose Bias Specific for Facial IDs • DNNs cannot disentangle head pose and facial identity. Kortylewski, et al. "Empirically analyzing the effect of dataset biases on deep face recognition systems. “, 2018. 7

Undoing the Damage of Dataset Bias with Synthetic Data • Exp. setup: Pre-train with synthetic data and fine-tune with real-world data 8

Why does pre-training with synthetic data help? • Because of the additional variability in the face pose and number of facial identities 9

Summary • Deep face recognition systems suffer from dataset bias • Using synthetic face images we can: • Quantify the damage of dataset bias on face recognition performance • Enhance the data efficiency and generalization performance of FR systems 10