The goal of this project was to manipulate facial features (like beard and glasses) using Variational Autoencoder.
The dataset used for training is CelebA.
Motivation: https://arxiv.org/abs/1611.05507
Autoencoder is a neural network designed to learn an identity function to reconstruct the original input while compressing the data in the process so as to discover a more efficient and compressed representation.
It is made from two parts: an encoder and a decoder.
The encoder takes input and maps it to low-dimensional latent space.
Decoder takes that vector and decodes it back to the original image
In the variational autoencoder, the encoder part doesn't map an input to a vector but to distribution, so latent space is filled better.
The main idea was to train VAE and after that calculate the average encoding of images with and without some feature. When we subtract those values we will get a vector by which we should translate the encoding of an image so after decoding we would get an image with or without that feature.
Loss is constructed from two parts Reconstruction loss and KL-divergence loss.
Reconstruction loss is penalizing the model for differences in input and output images.
KL-divergence loss should bring distributions returned by the encoder closer to standard normal distribution.
Average person with and without beard | Average person with and without glasses |
---|---|
Images are made by decoding average of encodings |