Introduction

This project is an unofficial implementation of MAE with the support of Beneufit, Inc.. Transformers were built purely using PyTorch and Einops library. Positional Encoding token modules were also implemented in reference to the original vision transformers' paper.

The idea of MAE is to leverage a huge set of unlabelled data (images) to learn rich representations of the dataset. These learned representations can then be utilized in downstream tasks such as classification, clustering, image segmentation, or anomaly detection, significantly enhancing performance by providing a strong, pre-trained feature extractor that adapts well to various applications.

Experimental Results

Dataset

I segregated the dataset from Kaggle's Doges 77 Breeds into three parts. About 10k images (labelled) to train the downstream classification part. About 5k images (labelled) to be used for testing/evaluating the final downstream-ed model. And, lastly the remaining data (labels removed) to train the MAE itself without any labels (about 300k+ images). There were also a few thousands of random dog pictures included in this last set.

MAE Training

The pre-training part (training the MAE model itself) was done using 2 RTX 4090, 32GB RAM and 16 cores of AMD Ryzen CPU.

The configurations used during the training is the exact same as in Masked-AutoEncoder-PyTorch/configs/pretrain/mae_pretrain_224_16.yaml.

The MAE's training loss is as shown below. Cosine annealing was used as the learning rate strategy. I believe that cosine annealing, though produces unsmooth loss graph, is the best way to reach the global optima.

Training loss of MAE

Loss over 1100 epochs

Meanwhile, the reconstructions output of MAE were plotted at every 2 epochs. All the reconstructions can be found in the train_reconstructions folder. Figure below shows the reconstruction result from the 3rd epoch and the last epoch.


Reconstruction at epoch 2	Reconstruction at epoch 1100

It is evident that the MAE was learning as intended. However, I could not achieve a near-perfect reconstruction as reported in the paper. This is probably due to the size of my dataset and the relatively small architecture of MAE used.

Downstream Training

Using the weights of the encoder from the MAE above, classifier layers were added and fine-tuned. The fine-tuning is done by freezing the weights of the encoder fully. The results on the 10k downstream training dataset and the 5k testing dataset as mentioned previously are as below.


Train Accuracy with MAE	Test Accuracy with MAE

Both the training and testing above were done on their respective dataset as previously described. In just 20 epochs, the training accuracy reached about 41% for the 77 classes while the test accuracy reached about 35%.

Sanity Check

As a sanity check, I ran another identical expriment of the downstream task except that this time, the pretrained weights of MAE's encoder were not loaded.


Train Accuracy without MAE	Test Accuracy without MAE

The accuracies barely reached 3% over the 20 epochs. It's clear that the weights from the pretrained MAE encoder makes a large difference. This goes to show that the concept of MAE works.

How to Use

There are two parts in this section. The first part is training the MAE - which we will call as pretraining. The second part is to downstream the trained MAE for actual tasks - in this case, it's classification.

Pretraining the MAE

In order to train the MAE model - we'll call it pretraining since we're going to use this trained MAE to retrain again on a classification task.

First, install the required packages from requirements.txt.

To start the pretraining, first place a folder of dataset (unlabelled) and change the configurations at Masked-AutoEncoder-PyTorch/configs/pretrain/mae_pretrain_224_16.yaml appropriately. Next, run

python pretrain.py --config configs/pretrain/mae_pretrain_224_16.yaml --logging_config configs/pretrain/logging_pretrain.yaml

During the training, the visualizations of the reconstruction will be saved in the figures folder. You can refer to my results in the train_reconstructions folder.

Classification Downstream

Make sure that the weights of the pretrained model is placed at the appropriate location (depends on your configurations) and that the same configurations on the model from pretraining is used here as well at Masked-AutoEncoder-PyTorch/configs/finetune/mae_finetune_224_16.yaml. Here, the dataset needs to be labelled - place the images separately in folders according to their classes. Then, start the training with

python finetune.py --config configs/finetune/mae_finetune_224_16.yaml --logging_config configs/finetune/logging_finetune.yaml

Acknowledgement

I extend my sincere gratitude to Beneufit, Inc. for their generous funding and support. Their commitment to innovation made this project possible and has been a source of inspiration for me. Thank you, Beneufit, Inc., for your invaluable contribution.

Future Works

I will further continue the experiments with other computer vision tasks such as object localization and pose estimations with the same trained weights. The objective here is to investigate whether or not that MAE is useful for various different computer vision tasks other than classification.

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
configs		configs
models		models
readme_images		readme_images
train_reconstructions		train_reconstructions
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fine_tuning.py		fine_tuning.py
init_optim.py		init_optim.py
load_dataset.py		load_dataset.py
plot.py		plot.py
pretrain.py		pretrain.py
pretrain_multi_gpu.py		pretrain_multi_gpu.py
requirements.txt		requirements.txt
utils.py		utils.py
verify_images.py		verify_images.py
visualize_prediction.py		visualize_prediction.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Introduction

Experimental Results

Dataset

MAE Training

Downstream Training

Sanity Check

How to Use

Pretraining the MAE

Classification Downstream

Acknowledgement

Future Works

License

About

Releases

Packages

Languages

License

Ugenteraan/Masked-AutoEncoder-PyTorch

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Introduction

Experimental Results

Dataset

MAE Training

Downstream Training

Sanity Check

How to Use

Pretraining the MAE

Classification Downstream

Acknowledgement

Future Works

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages