super_resolution_project.py

# -*- coding: utf-8 -*-
"""Super_Resolution_Project.ipynb

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/drive/1eJd3AWKTSNFvVlCSA4_Yu4CN3htFZbot

# Task introduction

Our project is based on replicating the paper "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial
Network" [[1]](#biblio). The goal is single image super resolution, which means upscaling an image from low resolution to high resolution using a deep CNN.

# Setup

In order to run this notebook you need to have the following folders in the root of your Google Drive:

- `dataset_super_resolution` folder, this folder contains the training and validation dataset DIV2K for the super-resolution task, it contains 800 x 2 training images, 800 low resolution and 800 high resolution and 100 x 2 validation images, 100 low resolution and 100 high resolution.
Download link: https://drive.google.com/drive/folders/1MdcQS_9Mt89KnwyUSKtPyU-KNVmmG-uZ?usp=sharing

- `Test` folder, this folder contains the 5x4 images provided for testing, 5 low resolution images, 5 images upsampled with bicubic interpolation, 5 that are the result of the CNN obtained by clicking the button "Our SR result" taken from http://people.rennes.inria.fr/Aline.Roumy/results/SR_BMVC12.html and finally the real 5 images. They are used for the final test after the network has been finalized and tweaked based on the results on the training and validation datasets. No changes have been done to the network after evaluating it on these images to avoid erroneously also using them as validation. Download link: https://drive.google.com/drive/folders/1XSEnp1otkVy6r3Gf4XvsYF-B_CluJZ-p?usp=sharing

## Importing libraries
"""

# Commented out IPython magic to ensure Python compatibility.
# %%capture
# %pip install imageio
# %pip install pytorch-ignite

from copy import deepcopy
import random
from imageio import imread, imsave
import matplotlib.pyplot as plt
import numpy as np
import os
import glob

import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
from torchvision import transforms
import torch.optim as optim
import torch.optim.lr_scheduler as sc
import torch.nn.functional as F
from torchvision.models import vgg19
from torch.autograd import Variable
from ignite.metrics import PSNR

"""## Seeding the random generators for reproducibility"""

# https://gist.github.com/ihoromi4/b681a9088f348942b01711f251e5f964
def seed_everything(seed: int):
    
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True
    
seed_everything(42)

"""## Checking GPU access"""

print(f"We have {'' if torch.cuda.is_available() else 'not'}access to a GPU")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
if torch.cuda.is_available():
    print(torch.cuda.current_device())
    print(torch.cuda.device(0))
    print(torch.cuda.device_count())
    print(torch.cuda.get_device_name(0))

    print(device)

"""## Connection to the dataset on Google Drive


"""

from google.colab import drive
drive.mount('/content/drive')
dataset_path = "drive/MyDrive/dataset_super_resolution/"

"""Showing an image to check that the connection to Google Drive is set-up correctly and the image is being read correctly:"""

im1_high_resolution = imread(dataset_path + "training_HR/0001.png")
print(im1_high_resolution.shape)
print(im1_high_resolution.dtype)
plt.imshow(im1_high_resolution)

im1_low_resolution = imread(dataset_path + "training_LR_bicubic/0001x4.png")
print(im1_low_resolution.shape)
print(im1_low_resolution.dtype)
plt.imshow(im1_low_resolution)

"""## Definition of the utility function for cropping the images

We base our training on small patches instead of whole images by following the example of the paper mentioned above.

By extracting all the possible patches from each image we would get approximately 260 patches per image as the average size of a high resolution image is about 2000x1300 and each high resolution patch is 96x96 (so about 100x100), so we would get approximately $\frac{2000 * 1300}{100 * 100} = 260$ patches from each image by extracting all of them. Sadly extracting so many patches from all of the images of the dataset and using those to train the model would go above the maximum training time possible on Colab, so instead we only extract 20 random patches from each one in order to successfully train inside of the limited resources provided by Colab.
"""

PATCH_SIZE = 96
PATCHES_PER_IMAGE = 20

def randomint_divby4(min_, max_):
    """
    Generate a random integer from min_ to max_ - 1, 
    so that it is divisible by 4.
    
    We do this to allow a better correspondance
    between high and low resolution patches, because
    given that the low resolution patch index is obtained by dividing the
    high resolution patch index by 4, the high resolution patch index must be exactly
    divisible by 4 to avoid approximation.
    """
    candidate = random.randint(min_, max_ - 1)
    while candidate % 4 != 0:
        candidate = random.randint(min_, max_ - 1)
    return candidate

def divide_in_patches(image_lr, image_hr):# -> Iterator[(patch_low, patch_high)]
    """
    Divide the High resolution image and the low resolution image into the
    corresponding high resolution and low resolution patches for 
    training. (i.e: the two patches in an output pair contain the same content 
    but the low resolution one is more pixelated.)
    """
    height, width, channels = image_hr.shape
    random_indexes = []
    while len(random_indexes) < PATCHES_PER_IMAGE:
        candidate = (randomint_divby4(0, height-1-PATCH_SIZE),
                    randomint_divby4(0, width-1-PATCH_SIZE))
        # Avoid extracting the same patch twice
        if candidate not in random_indexes:
            random_indexes.append(candidate)

    corresponding_low_res_indexes = [(y//4, x//4) for (y, x) in random_indexes]

    PATCH_SIZE_LOW_RES = PATCH_SIZE // 4
    for hi_index, low_index in zip(random_indexes, corresponding_low_res_indexes):
        y_high, x_high = hi_index
        y_low, x_low = low_index

        high_res_crop = torch.FloatTensor(image_hr[y_high:y_high+PATCH_SIZE, x_high:x_high+PATCH_SIZE])
        low_res_crop = torch.FloatTensor(image_lr[y_low:y_low+PATCH_SIZE_LOW_RES, x_low:x_low+PATCH_SIZE_LOW_RES])
        
        yield (low_res_crop, high_res_crop)

"""### Example patches from image

Showing the first image and the patches taken from its high and low resolution versions in order to double check the correctness of the file paths, of the image reading process and of the patch division function:
"""

img1_high_resolution = imread(dataset_path + "training_HR/0001.png")
img1_low_resolution = imread(dataset_path + "training_LR_bicubic/0001x4.png")

patches = list(divide_in_patches(img1_low_resolution, img1_high_resolution))
for plow, phigh in patches[:3]:
   plt.imshow(phigh.int().cpu() )
   plt.show()
   plt.imshow(plow.int().cpu() )
   plt.show()

"""## Creating the patch files in the Google Drive"""

def patches_from_directory(directory_hr, directory_lr, N, starting_offset):
    for number in range(1 + starting_offset, N + 1 + starting_offset):
        # High resolution images are named in the dddd.png format
        # Low resolution images are named in the ddddx4.png format
        # with d a digit.
        high_res_image = imread(directory_hr + f"/{str(number).zfill(4)}.png")
        low_res_image = imread(directory_lr + f"/{str(number).zfill(4)}x4.png")

        # Logging because the loading process takes a while
        if number % 10 == 0:
            print(f"Getting patches from image number {number}")

        for patch_pair in divide_in_patches(low_res_image, high_res_image):
            low, high = patch_pair
            yield low, high

def build_patch_dataset(hr_folder, lr_folder, patch_hr_folder, patch_lr_folder, N, offset):
    """
    This function builds the patch files starting from the whole image dataset.
    The files are named in the format `phighdddddd.png` for the high res patches
    and in the format `plowdddddd.png` for the low res patches.
    Where d is either 0 to the left or a digit (example: `phigh000023.png`).

    """
    for i, (patchlow, patchhigh) in enumerate(patches_from_directory(hr_folder, lr_folder, N, offset)):
        n = f"{str(i).zfill(6)}"
        #print("saving at ", f"{patch_hr_folder}/phigh{n}.png")
        imsave(f"{patch_hr_folder}/phigh{n}.png", np.array(patchhigh).astype(np.uint8))
        imsave(f"{patch_lr_folder}/plow{n}.png", np.array(patchlow).astype(np.uint8))

# Build training patches. it will take around 20min
try:
    os.mkdir(dataset_path + "training_patches_HR")
    os.mkdir(dataset_path + "training_patches_LR")
except FileExistsError:
    pass

build_patch_dataset(dataset_path + "training_HR",
                    dataset_path + "training_LR_bicubic",
                    dataset_path + "training_patches_HR",
                    dataset_path + "training_patches_LR",
                    800,
                    0)

# Build validation patches
try:
    os.mkdir(dataset_path + "validation_patches_HR")
    os.mkdir(dataset_path + "validation_patches_LR")
except FileExistsError:
    pass

build_patch_dataset(dataset_path + "validation_HR",
                    dataset_path + "validation_LR_bicubic",
                    dataset_path + "validation_patches_HR",
                    dataset_path + "validation_patches_LR",
                    100,
                    800
                    )

"""## Definition of the `Dataset` class"""

class SuperResolutionDataset(Dataset):
    """Super Resolution dataset."""

    def __init__(self, path_high_res, path_low_res):

        self.path_high_res = path_high_res
        self.path_low_res = path_low_res
        self.high_count = len(glob.glob(
            os.path.join(self.path_high_res, "phigh*.png")
        ))
        self.low_count = len(glob.glob(
            os.path.join(self.path_low_res, "plow*.png")
        ))
    def __len__(self):
        # print(self.high_count, self.low_count)
        assert self.high_count == self.low_count
        return self.high_count

    def __getitem__(self, idx):
        n = f"{str(idx).zfill(6)}"
        # load just two patches and output them
        low = imread(self.path_low_res + f"/plow{n}.png")
        low = torch.FloatTensor(np.array(low))

        high = imread(self.path_high_res + f"/phigh{n}.png")
        high = torch.FloatTensor(np.array(high))

        # Important technical detail:
        # the format in which the image is read and the input format
        # for the network are different, so we need to permute the 
        # indexes to make those compatible.
        low = low.permute(2, 0, 1)
        high = high.permute(2, 0, 1)

        return low, high

"""### Building the train and validation datasets"""

train_dataset = SuperResolutionDataset(
    dataset_path + "training_patches_HR",
    dataset_path + "training_patches_LR",
)

validation_dataset = SuperResolutionDataset(
    dataset_path + "validation_patches_HR",
    dataset_path + "validation_patches_LR",
)
print(dataset_path + "training_patches_HR")

# How many patches do we have?
print(f"There are {len(train_dataset)} patches in the training dataset")
print(f"There are {len(validation_dataset)} patches in the validation dataset")

# Printing the shape of the first high and low resolution patches:
print(validation_dataset[0][0].shape)
print(validation_dataset[0][1].shape)
#print(train_dataset[0])
plt.imshow(validation_dataset[0][0].permute(1,2,0).int().cpu())

# Printing the shape of the first high and low resolution validation patches:
print(train_dataset[0][0].shape)
print(train_dataset[0][1].shape)
#print(train_dataset[0])
plt.imshow(train_dataset[0][0].permute(1, 2, 0).int().cpu())
plt.show()
plt.imshow(train_dataset[0][1].permute(1, 2, 0).int().cpu())
plt.show()

# Showing some more examples of high and low resolution patches from the Dataset.
for _ in range(3):
    patch_low, patch_high = random.choice(train_dataset)
    plt.imshow(patch_high.permute(1,2,0).int().cpu())
    plt.show()
    plt.imshow(patch_low.permute(1,2,0).int().cpu())
    plt.show()

"""# CNN model with MSE loss

Now we implement the CNN model that will be pretrained in a standard supervised training setting before being "fine-tuned" in a GAN adversarial setting in the next section.

The loss is simply the Mean Squared Error (MSE) loss between the real high resolution patch from the dataset and the approximated upsampling that the network gives as output.

### Experimentation attempts

<a name='changes'></a>

We experimented with changes to both the network and the training process, while keeping all the Random Number Generators initialized with the same seed to avoid random fluctuations:

**Changes to the network**
- Remove batch normalization.
- Change the upsample process from the `FinalBlock` as described in the paper to a simple `nn.upsample` layer.
- Change the number of convolutional blocks.

**Changes to the training process**
-   Change the learning rate initial value and scheduler (constant or exponential decrease with different gammas).
- Change the optimizer from Adam to SGD and also RMSProp.

All of these changes never improved the performance of the network substantially, as the effects of the changes on the final validation loss were always slightly negative, with a validation loss on the last epoch always near 220. The results were reasonable but still visibly blurry.

## Definition

Now we define the CNN network:
"""

class Block(nn.Module):
    """
    Convolution block.
    """
    def __init__(self):
        super().__init__()
        self.conv_block1 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1)
        self.batch_norm_block1 = nn.BatchNorm2d(64)
        self.prelu_block1 = nn.PReLU()

        self.conv_block2 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1)
        self.batch_norm_block2 = nn.BatchNorm2d(64)

    def forward(self, x):
        start = x
        x = self.conv_block1(x)
        x = self.batch_norm_block1(x)
        x = self.prelu_block1(x)
        x = self.conv_block2(x)
        x = self.batch_norm_block2(x)
        x = start + x
        return x

class FinalBlock(nn.Module):
    """
    Upsampling block.
    """
    def __init__(self):
        super().__init__()
        self.final_block_conv = nn.Conv2d(in_channels=64, out_channels=256, kernel_size=3, stride=1, padding=1)
        self.final_block_pixel_shuffle = nn.PixelShuffle(upscale_factor=2)
        self.final_block_relu = nn.PReLU()

    def forward(self, x):
        x = self.final_block_conv(x)
        x = self.final_block_pixel_shuffle(x)
        x = self.final_block_relu(x)
        return x

class CNN(nn.Module):
    """
    Implementation of the CNN as described in the paper.
    """
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=9, stride=1, padding=4)
        self.relu1 = nn.PReLU()

        self.block1 = Block()
        self.block2 = Block()
        self.block3 = Block()
        self.block4 = Block()
        self.block5 = Block()

        self.conv2 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1)
        self.batch_norm = nn.BatchNorm2d(64)

        self.final_block1 = FinalBlock()
        self.final_block2 = FinalBlock()

        self.conv3 = nn.Conv2d(in_channels=64, out_channels=3, kernel_size=9, stride=1, padding=4)

    def forward(self, x ):
        x = self.conv1(x)
        x = self.relu1(x)
        out1 = x

        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        x = self.block4(x)
        x = self.block5(x)

        x = self.conv2(x)
        x = self.batch_norm(x)
        x = x + out1
  
        x = self.final_block1(x)
        x = self.final_block2(x)
        
        x = self.conv3(x)
        return x

#We call the net cnn_net, we send it to the GPU to train
cnn_net = CNN()
cnn_net.train()
ccn_net = cnn_net.to(device)

"""## Training

### Utility function to show an example output image
"""

def plot_examples(model, validation_dataset, num_examples=5):
    model.eval()
    fig, ax = plt.subplots(num_examples,4, figsize=(20,5*num_examples))
    for i in range(num_examples):
        low, high = validation_dataset[i][0], validation_dataset[i][1]
        ax[i][0].imshow(low.permute(1,2,0).int().cpu()) 
        ax[0][0].set_title('Original Low Resolution')

        ax[i][1].imshow(high.permute(1,2,0).int().cpu())
        ax[0][1].set_title('Original High Resolution')

        interpolated = nn.functional.interpolate(low.unsqueeze(0), scale_factor=4, mode='bicubic')
        ax[i][2].imshow(interpolated.squeeze(0).permute(1,2,0).int().cpu().detach().numpy())
        ax[0][2].set_title('Bicubic Interpolated High Resolution')

        output = model(low.unsqueeze(0).to(device))  
        ax[i][3].imshow(output.squeeze(0).permute(1,2,0).int().cpu().detach().numpy())
        ax[0][3].set_title('Generated High Resolution')
    plt.show()

"""### Training loop"""

# Hyperparameters
BATCH_SIZE = 32 
LR = 3e-5

criterion = torch.nn.MSELoss()
optimizer = optim.Adam(cnn_net.parameters(), lr = LR)
scheduler = sc.ExponentialLR(optimizer, gamma = 0.98)

# Dataloaders
trainloader = DataLoader(train_dataset, batch_size = BATCH_SIZE, shuffle = True, num_workers=0, drop_last=True)
validation_loader = DataLoader(validation_dataset, batch_size = BATCH_SIZE, drop_last=True)

# Shapes of the patches
print(train_dataset[0][0].shape)
print(train_dataset[0][1].shape)
print(validation_dataset[0][0].shape)
print(validation_dataset[0][1].shape)

"""Standard training loop based on the pytorch tutorial [[2]](#biblio)"""

train_losses = []
validation_losses = []

# Parameters for early stopping
best_loss = np.inf
patience = 3
trigger_times = 0

EPOCHS = 15

for epoch in range(EPOCHS):
    print(f"Epoch number {epoch}")
    running_train_loss = 0.0

    ## Standard pytorch training loop, batching is automatically
    ## handled by the Dataloader class.
    for i, data in enumerate(trainloader, 0):

        ccn_net.train(True)
        inputs_low_res, real_highres = data
        
        # To the GPU
        inputs_low_res = inputs_low_res.to(device)
        real_highres = real_highres.to(device)

        # Zeroing the parameter gradients
        optimizer.zero_grad()
        
        outputs = cnn_net(inputs_low_res)
        loss = criterion(outputs, real_highres)
        loss.backward()
        optimizer.step()

        if i % 50 == 0:
            print(f"Doing batch {i}, in epoch {epoch}")
            print(f"Loss on this batch is {loss.item()}")
        running_train_loss += loss.item()

    # Average loss for each training epoch
    # obtained by averaging the loss on the batches
    avg_train_loss = running_train_loss / (i + 1) 

    scheduler.step()
    train_losses.append(avg_train_loss)


    ## Computing the validation loss at the end of each epoch.
    ## Of course we are careful to disable the training mode and the gradient updates
    ## while doing it, otherwise the validation set would become part of the training set.
    cnn_net.train(False)
    with torch.no_grad():

        running_vloss = 0.0
        for i, vdata in enumerate(validation_loader):
            vinputs, vreal = vdata

            vinputs = vinputs.to(device)
            vreal = vreal.to(device)

            outputs = cnn_net(vinputs)
            vloss = criterion(outputs, vreal)
            running_vloss += vloss

        avg_loss = running_vloss / (i + 1)
        validation_losses.append(avg_loss.item())
        print(f"Validation loss at the end of epoch {epoch} is {avg_loss.item()}")

    ## Using the validation loss for early stopping
    ## Stop the training if the validation loss goes up for too long
    if avg_loss > best_loss:
        trigger_times += 1

        if trigger_times >= patience:
            print("Early stopping")
            break
            
    else:
        trigger_times = 0
        best_loss = avg_loss
    
    ## Show an example output after each epoch to visually see
    ## how the output improves during the training
    print(f"Example output at the end of epoch {epoch}")
    plot_examples(cnn_net, validation_dataset)
print('Finished Training')

## Saving the weights in a file in order to be used later by the GANs
PATH_CNN = "after_supervised_MSE_training_only.pth"
torch.save(deepcopy(cnn_net), PATH_CNN)
print("Saved weights after only supervised MSE training.")

"""### Training and validation losses graphs"""

## Plotting the losses on the same graph
plt.plot(train_losses, label = "TRAINING loss")
plt.plot(validation_losses, label = "VALIDATION loss")
plt.xlabel("Epochs")
plt.legend()
plt.show()

"""## Example output of the model

These examples are taken from the validation set, that is used to decide if and when to perform early stopping, so it is a set that already influences the training of the model, so the performance of the model on an image taken from the validation set is expected to be higher than the performance of the model on a never seen before (test set) image.
"""

plot_examples(cnn_net, validation_dataset, num_examples=10)

"""# GAN architectures

We implemented two slightly different GANs:

*   List item
*   List item


The first GAN is based on the original paper [[1]](#biblio) with inputs from the GitHub repository: SRGAN [[3]](#biblio)
The second one is the first model modified in order to make it a WGAN, using Wasserstein Loss, following the suggestions found in this post[[4]](#biblio) and in the professor's lessons.
The Generator is the same for both archtectures.
We also tried weight clipping instead of the gradient penalty function to implement the WGAN, but without obtaining better results than the SRGAN.

## Possible improvement tried

Another implementation that we tried after seeing the discriminator loss being  such a big number was to also pre-train the discriminator. We did not include this implementation in the final version of the project since it did not improve significantly the results on the images and because we are doing something similar with the WGAN. Anyway, we used the same images, but without creating the patches and applied on them some data augmentation transformations obtaining around 2000 images in total between training and validation. We created another custom dataset which labeled the high res images with "label = 1" and "label = 0" for the low resolution ones and used the discriminator class to learn to distinguish between the two of them, then exported the weights to be applied in the GANs framework. We used as reference this post [[5]](#biblio).

## Feature Extractor for conceptual loss and PSNRLoss common to both architectures
"""

class FeatureExtractor(nn.Module):
    def __init__(self):
        super(FeatureExtractor, self).__init__()
        
        vgg = vgg19(pretrained=True)
        
        # Get features obtained by the 4th conv before the 5th maxpool
        self.vgg19_54 = nn.Sequential(*list(vgg.features.children())[:35])

    def forward(self, img):
        return self.vgg19_54(img)

def PSNRLoss(y_pred, y, device):
    max_pixel = 255.0
    
    psnr = PSNR(data_range=max_pixel, device=device)
    psnr.update((y_pred, y))
    psnr_compute = psnr.compute()
    
    return psnr_compute

def evaluate(feature_extractor, generator, discriminator, dataloader):
    # Send everything to the GPU
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    feature_extractor = feature_extractor.to(device)
    generator = generator.to(device)
    discriminator = discriminator.to(device)
    
    # Set networks to evaluation mode
    generator.eval()
    discriminator.eval()
    
    running_psnr = 0.0
    
    for lr, hr in dataloader:
        lr = lr.to(device)
        hr = hr.to(device)
        
        generated_hr = generator(lr)
        
        psnr = PSNRLoss(generated_hr, hr, device)
        
        running_psnr += psnr.item()
        
    return running_psnr / len(dataloader)

"""#SRGAN 's Discriminator


"""

class Discriminator_srgan(nn.Module):
    def __init__(self):
        super(Discriminator_srgan, self).__init__()
        self.blocks = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1),
            nn.LeakyReLU(0.2),

            nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(64),
            nn.LeakyReLU(0.2),

            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(128),
            nn.LeakyReLU(0.2),

            nn.Conv2d(128, 128, kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(128),
            nn.LeakyReLU(0.2),

            nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(256),
            nn.LeakyReLU(0.2),

            nn.Conv2d(256, 256, kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(256),
            nn.LeakyReLU(0.2),

            nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(512),
            nn.LeakyReLU(0.2),

            nn.Conv2d(512, 512, kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(512),
            nn.LeakyReLU(0.2),

            )

        self.linear = nn.Sequential(
            #512 channels, 6x6 image because 4 layers of stride 2, 96/16=6   
            nn.Linear(512 * 6 * 6, 1024),           
            nn.LeakyReLU(0.2, True),
            nn.Linear(1024, 1),
            )

    def forward(self, x):
        out = self.blocks(x)
        out = torch.flatten(out, 1)
        out = self.linear(out)
        
        return torch.sigmoid(out)

"""## Training of SRGAN

### Training loop
"""

# Hyperparams, same as the ones of the original paper
# Learning rate has been lowered 
lr = 5e-5

lambda_adversarial = 1e-3
lambda_pixel = 1e-2
gan_epochs = 3

# Call models
feature_extractor = FeatureExtractor()
generator = cnn_net
discriminator = Discriminator_srgan()

# Send everything to GPU
feature_extractor = feature_extractor.to(device)
generator = generator.to(device)
discriminator = discriminator.to(device)

# Set feature extractor to evaluation mode
feature_extractor.eval()

# Get losses and optimizers
criterion_pixel = nn.L1Loss()
criterion_gan = nn.BCEWithLogitsLoss()
criterion_content = nn.L1Loss()

optimizer_generator = optim.Adam(generator.parameters(), lr=lr)
optimizer_discriminator = optim.Adam(discriminator.parameters(), lr=lr)

# Saving all the losses for plotting
epoch_loss_generator_list = []
epoch_loss_discriminator_list = []
epoch_loss_gan_list = []
epoch_loss_pixel_list = []
epoch_loss_content_list = []

psnr_train_list = []
psnr_validation_list = []

# Train
for epoch in range(gan_epochs):

    # Set networks to training mode
    generator.train()
    discriminator.train()
    
    # Log losses
    running_loss_generator = 0.0
    running_loss_discriminator = 0.0
    running_loss_gan = 0.0
    running_loss_pixel = 0.0
    running_loss_content = 0.0
    
    for i, (lr, hr) in enumerate(trainloader):
        batches_done = epoch * len(trainloader) + i
        
        lr = lr.to(device)
        hr = hr.to(device)

        # Train generator:
        ## Zero gradients
        optimizer_generator.zero_grad()
        
        ## Generate HR image from LR image
        hr_generated = generator(lr)
        
        ## Measure pixel-wise loss against ground truth
        loss_pixel = criterion_pixel(hr_generated, hr)
        
        
        ## Use the discriminator to make predictions
        preds_real = discriminator(hr).detach()
        preds_fake = discriminator(hr_generated)
        
        ## Create adversarial ground truths (for the discriminator)
        real = Variable(torch.ones(preds_real.size()), requires_grad=False).to(device)
        fake = Variable(torch.zeros(preds_fake.size()), requires_grad=False).to(device)
        
        ## Compute adversarial loss
        loss_gan = criterion_gan(preds_fake, real)
        
        ## Compute content loss
        features_generated = feature_extractor(hr_generated)
        features_real = feature_extractor(hr).detach()
        loss_content = criterion_content(features_generated, features_real)
        
        ## Compute total generator loss
        loss_generator = loss_content + lambda_adversarial * loss_gan 
        
        ## Backpropagate and update weights
        loss_generator.backward()
        optimizer_generator.step()
        
        # Train discriminator:
        ## Zero gradients
        optimizer_discriminator.zero_grad()
        
        ## Use the discriminator to make predictions
        preds_real = discriminator(hr)
        preds_fake = discriminator(hr_generated.detach())
        
        ## Compute adversarial loss for real and fake images (relativistic average GAN)
        loss_real = criterion_gan(preds_real, real)
        loss_fake = criterion_gan(preds_fake, fake)
        
        ## Compute total discriminator loss
        loss_discriminator = (loss_real + loss_fake) / 2
        
        ## Backpropagate and update weights
        loss_discriminator.backward()
        optimizer_discriminator.step()
        
        # Log losses
        running_loss_generator += loss_generator.item()
        running_loss_discriminator += loss_discriminator.item()
        running_loss_gan += loss_gan.item()
        running_loss_pixel += loss_pixel.item()
        running_loss_content += loss_content.item()
        
    # Get the average of the losses in an epoch
    epoch_loss_generator = running_loss_generator / len(trainloader)
    epoch_loss_discriminator = running_loss_discriminator / len(trainloader)
    epoch_loss_gan = running_loss_gan / len(trainloader)
    epoch_loss_pixel = running_loss_pixel / len(trainloader)
    epoch_loss_content = running_loss_content / len(trainloader)
    print('[Epoch {}/{}] loss_generator: {:.6f}, loss_discriminator: {:.6f}, loss_gan: {:.6f}, loss_pixel: {:.6f}, loss_content: {:.6f}'.format(epoch+1, gan_epochs, epoch_loss_generator, epoch_loss_discriminator, epoch_loss_gan, epoch_loss_pixel, epoch_loss_content))
    
    # Evaluate PSNR on train and validation datasets
    psnr_train = evaluate(feature_extractor, generator, discriminator, trainloader)
    psnr_validation = evaluate(feature_extractor, generator, discriminator, validation_loader)
    
    # Save all the losses in lists to plot them after
    epoch_loss_generator_list.append(epoch_loss_generator)
    epoch_loss_discriminator_list.append(epoch_loss_discriminator)
    epoch_loss_gan_list.append(epoch_loss_gan)
    epoch_loss_pixel_list.append(epoch_loss_pixel)
    epoch_loss_content_list.append(epoch_loss_content)

    psnr_train_list.append(psnr_train)
    psnr_validation_list.append(psnr_validation)

    print(f'psnr on train set: {psnr_train}; psnr on validation set: {psnr_validation}')
    
    # Show some images after each training epoch
    plot_examples(model=generator, validation_dataset=validation_dataset, num_examples=5)
   
  
# Save the weigths of the generator in a file to be used for testing
PATH_GAN1 = "after_GAN_training.pth"
torch.save(deepcopy(generator), PATH_GAN1)
print("Saved weights after GAN training.")

"""### Training and validation losses graphs"""

def plot_loss(loss_list, loss_name):
    plt.title(f"{loss_name} loss vs Epoch number")
    plt.xlabel("Epoch")
    plt.ylabel(f"{loss_name} loss")

    plt.plot(loss_list)
    plt.show()

plot_loss(epoch_loss_generator_list, "Generator Training")
plot_loss(epoch_loss_discriminator_list, "Discriminator Training")
plot_loss(epoch_loss_gan_list, "GAN")
plot_loss(epoch_loss_pixel_list, "Pixel")
plot_loss(epoch_loss_content_list, "Content")
plot_loss(psnr_train_list, "PSNR Training")
plot_loss(psnr_validation_list, "PSNR Validation")

"""## Example output of the model

These examples are taken from the validation set, that is used to decide if and when to perform early stopping, so it is a set that already influences the training of the model, so the performance of the model on an image taken from the validation set is expected to be higher than the performance of the model on a never seen before (test set) image.
"""

plot_examples(generator, validation_dataset, 10)

"""# WGAN architecture

##Definition of the Gradient Penalty
"""

# Implementation of Wasserstein function for the Discriminator
def gradient_penalty( discriminator, hr, hr_generated, device= device):
    batch_size, channel, height, width= hr.shape
    # Alpha is selected randomly between 0 and 1
    alpha= torch.rand(batch_size,1,1,1).repeat(1, channel, height, width).to(device)
    # Interpolated image=randomly weighted average between a real and fake image
    # Interpolated image ← alpha *real image  + (1 − alpha) fake image
    interpolated_image=(alpha*hr) + (1-alpha) * hr_generated
    
    # Calculate the critic score on the interpolated image
    interpolated_score= discriminator(interpolated_image)
    
    # Take the gradient of the score wrt to the interpolated image
    gradient = torch.autograd.grad(inputs=interpolated_image,
                                  outputs=interpolated_score,
                                  grad_outputs=torch.ones_like(interpolated_score)                          
                                   )[0]
    gradient= gradient.view(gradient.shape[0],-1)
    gradient_norm = gradient.norm(2,dim=1)
    gradient_penalty = torch.mean((gradient_norm-1)**2)
    return gradient_penalty

"""##WGAN Discriminator"""

# Using InstanceNorm instead of BatchNorm and removing the sigmoid at the end 
# of the Discriminator, so the output is no more bounded to (0,1).
class Discriminator_wgan(nn.Module):
    def __init__(self):
        super(Discriminator_wgan, self).__init__()
        self.blocks = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1),
            nn.LeakyReLU(0.2),

            nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1),
            nn.InstanceNorm2d(64),
            nn.LeakyReLU(0.2),

            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
            nn.InstanceNorm2d(128),
            nn.LeakyReLU(0.2),

            nn.Conv2d(128, 128, kernel_size=3, stride=2, padding=1),
            nn.InstanceNorm2d(128),
            nn.LeakyReLU(0.2),

            nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
            nn.InstanceNorm2d(256),
            nn.LeakyReLU(0.2),

            nn.Conv2d(256, 256, kernel_size=3, stride=2, padding=1),
            nn.InstanceNorm2d(256),
            nn.LeakyReLU(0.2),

            nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1),
            nn.InstanceNorm2d(512),
            nn.LeakyReLU(0.2),

            nn.Conv2d(512, 512, kernel_size=3, stride=2, padding=1),
            nn.InstanceNorm2d(512),
            nn.LeakyReLU(0.2),

            )

        self.linear = nn.Sequential(
            #512 channels, 6x6 image because 4 layers of stride 2, 96/16=6   

            nn.Linear(512 * 6 * 6, 1024),           
            nn.LeakyReLU(0.2),
            nn.Linear(1024, 1),
            )

    def forward(self, x):
        out = self.blocks(x)
        out = torch.flatten(out, 1)
        out = self.linear(out)
        
        return out

"""#Training of the WGAN

"""

#We use the weights of the pre-trained CNN for the Generator Network.

# Hyperparams, same as the ones of the original paper
# Learning rate has been lowered 
LR2 = 5e-6
lambda_adversarial = 1e-3
lambda_pixel = 1e-2
gan_epochs = 3
Discriminator_iterations = 5

# Call models
feature_extractor = FeatureExtractor()
generator2 = torch.load(PATH_CNN)
discriminator = Discriminator_wgan()

# send everything to the GPU
feature_extractor = feature_extractor.to(device)
generator2 = generator2.to(device)
discriminator = discriminator.to(device)

# Set feature extractor to evaluation mode
feature_extractor.eval()

# Choose losses and optimizers
criterion_pixel = nn.L1Loss()
criterion_gan = nn.BCEWithLogitsLoss()
criterion_content = nn.L1Loss()

#optimizer_generator = optim.Adam(generator.parameters(), lr=lr)
#optimizer_discriminator = optim.Adam(discriminator.parameters(), lr=lr)
optimizer_generator = optim.RMSprop(generator2.parameters(), lr=LR2)
optimizer_discriminator = optim.RMSprop(discriminator.parameters(), lr=LR2)

# Saving all the losses for plotting
epoch_loss_generator_list = []
epoch_loss_discriminator_list = []
epoch_loss_gan_list = []
epoch_loss_pixel_list = []
epoch_loss_content_list = []
psnr_train_list = []
psnr_validation_list = []

# Train:
for epoch in range(gan_epochs):

    # Set networks to training mode
    generator2.train()
    discriminator.train()
    
    # Log losses
    running_loss_generator = 0.0
    running_loss_discriminator = 0.0
    running_loss_gan = 0.0
    running_loss_pixel = 0.0
    running_loss_content = 0.0
    
    for i, (lr, hr) in enumerate(trainloader):
        batches_done = epoch * len(trainloader) + i
        
        lr = lr.to(device)
        hr = hr.to(device)

        # We train the Discriminator more than one time in each batch in order to
        # make it catch up to the pre-trained Generator

        for _ in range(Discriminator_iterations): 
          ## Generate HR image from LR image
          hr_generated = generator2(lr)
        
          ## Masure pixel-wise loss against ground truth\
          loss_pixel = criterion_pixel(hr_generated, hr)
        
        
          ## Use the discriminator to make predictions
          preds_real = discriminator(hr).detach()
          preds_fake = discriminator(hr_generated)
        
          ## Create adversarial ground truths (for the discriminator)
          real = Variable(torch.ones(preds_real.size()), requires_grad=False).to(device)
          fake = Variable(torch.zeros(preds_fake.size()), requires_grad=False).to(device)
        
          # Train Discriminator:
          ## Zero gradients
          optimizer_discriminator.zero_grad()
        
          ## Use the discriminator to make predictions
          preds_real = discriminator(hr)
          preds_fake = discriminator(hr_generated.detach())
        
          ## Compute the gradient penalty and the discriminator loss
          gp = gradient_penalty(discriminator, hr, hr_generated.to(device), device)
          loss_discriminator = -(torch.mean(preds_real) - torch.mean(preds_fake)) + lambda_adversarial *gp
          discriminator.zero_grad()
        
          ## Backpropagate and update weights
          loss_discriminator.backward(retain_graph = True)
          optimizer_discriminator.step()
        

        # Now we train the Generator:
        preds_fake = discriminator(hr_generated.detach())
        ## Zero gradients
        optimizer_generator.zero_grad()
        
        ## Compute adversarial losthat its
        loss_gan = criterion_gan(preds_fake, real)

        ## Compute content loss
        features_generated = feature_extractor(hr_generated)
        features_real = feature_extractor(hr).detach()
        loss_content = criterion_content(features_generated, features_real)

        ## Compute total generator loss
        loss_generator = loss_content + lambda_adversarial * loss_gan

        ## Backpropagate and update weights
        loss_generator.backward()
        optimizer_generator.step()

        # Log losses
        running_loss_generator += loss_generator.item()
        running_loss_discriminator += loss_discriminator.item()
        running_loss_gan += loss_gan.item()
        running_loss_pixel += loss_pixel.item()
        running_loss_content += loss_content.item()
        
    # Compute average losses in an epoch
    epoch_loss_generator = running_loss_generator / len(trainloader)
    epoch_loss_discriminator = running_loss_discriminator / len(trainloader)
    epoch_loss_gan = running_loss_gan / len(trainloader)
    epoch_loss_pixel = running_loss_pixel / len(trainloader)
    epoch_loss_content = running_loss_content / len(trainloader)
    print('[Epoch {}/{}] loss_generator: {:.6f}, loss_discriminator: {:.6f}, loss_gan: {:.6f}, loss_pixel: {:.6f}, loss_content: {:.6f}'.format(epoch+1, gan_epochs, epoch_loss_generator, epoch_loss_discriminator, epoch_loss_gan, epoch_loss_pixel, epoch_loss_content))
    
    # Evaluate on validation dataset
    psnr_train = evaluate(feature_extractor, generator2, discriminator, trainloader)
    psnr_validation = evaluate(feature_extractor, generator2, discriminator, validation_loader)
    
    # Save all the losses in lists to plot them after
    epoch_loss_generator_list.append(epoch_loss_generator)
    epoch_loss_discriminator_list.append(epoch_loss_discriminator)
    epoch_loss_gan_list.append(epoch_loss_gan)
    epoch_loss_pixel_list.append(epoch_loss_pixel)
    epoch_loss_content_list.append(epoch_loss_content)

    psnr_train_list.append(psnr_train)
    psnr_validation_list.append(psnr_validation)

    print(f'psnr on train set: {psnr_train}; psnr on validation set: {psnr_validation}')
    
    # Show some images at the end of each epoch to evaluate training advancement 
    plot_examples(model=generator2, validation_dataset=validation_dataset, num_examples=5)

# Saving the WGAN weigths for testing
PATH_GAN2 = "after_WGAN_training.pth"
torch.save(deepcopy(generator2), PATH_GAN2)
print("Saved weights after WGAN training.")

"""##Training and validation losses graphs"""

plot_loss(epoch_loss_generator_list, "Generator Training")
plot_loss(epoch_loss_discriminator_list, "Discriminator Training")
plot_loss(epoch_loss_gan_list, "GAN")
plot_loss(epoch_loss_pixel_list, "Pixel")
plot_loss(epoch_loss_content_list, "Content")
plot_loss(psnr_train_list, "PSNR Training")
plot_loss(psnr_validation_list, "PSNR Validation")

"""##Example output of the model"""

plot_examples(generator2, validation_dataset, 10)

"""#Results on the test images

The evaluation of all 3 of the finalized and tuned models on the provided test images follows. As required, in order to avoid any influence of the test data on the models, no changes have been done to the models after this final evaluation.

## Loading test images on another Google Drive folder
"""

test_path = "drive/MyDrive/Test/"

low_res_paths = [
    test_path + "Low_res/baby_mini_d4_gaussian.bmp",
    test_path + "Low_res/bird_mini_d4_gaussian.bmp",
    test_path + "Low_res/butterfly_mini_d4_gaussian.bmp",
    test_path + "Low_res/head_mini_d4_gaussian.bmp",
    test_path + "Low_res/woman_mini_d4_gaussian.bmp",
]

bicubic_paths = [
    test_path + "Bicubic/baby_bicubic_x4_gaussian.bmp",
    test_path + "Bicubic/bird_bicubic_x4_gaussian.bmp",
    test_path + "Bicubic/butterfly_bicubic_x4_gaussian.bmp",
    test_path + "Bicubic/head_bicubic_x4_gaussian.bmp",
    test_path + "Bicubic/woman_bicubic_x4_gaussian.bmp",
]

real_highres_paths = [
    test_path + "Real_high_res/baby.png",
    test_path + "Real_high_res/bird.png",
    test_path + "Real_high_res/butterfly.png",
    test_path + "Real_high_res/head.png",
    test_path + "Real_high_res/woman.png",
]

# Loading the images from the drive
low_images = [imread(low_path) for low_path in low_res_paths]
bicubic_images = [imread(bicubic_path) for bicubic_path in bicubic_paths]
real_images = [imread(real_path) for real_path in real_highres_paths]

"""## Results of the CNN model with MSE loss


"""

def apply_model(model, image):

    # We apply the same permutation that we used to the training images
    image = torch.cuda.FloatTensor(image).permute(2,0,1).unsqueeze(0)

    output = model(image)
    return output.squeeze().permute(1,2,0)

def PSNR_loss(a, b):

    # Prepare input for the PSNR loss function
    a = torch.cuda.FloatTensor(a).permute(2,0,1).unsqueeze(0)
    b = torch.cuda.FloatTensor(b).permute(2,0,1).unsqueeze(0)
    return PSNRLoss(a, b, device)

def show_all(imgs_low, imgs_bicubic, imgs_output, imgs_real):
    running_psnr_bicubic = 0
    running_psnr_output = 0

    for low, bicubic, output, real in zip(imgs_low, imgs_bicubic, imgs_output, imgs_real):
        print("Low resolution")
        plt.imshow(low)
        plt.show()

        bicubic_psnr = PSNR_loss(bicubic, real).item()
        running_psnr_bicubic += bicubic_psnr
        print(f"Bicubic interpolation, loss with respect to original {bicubic_psnr}")
        plt.imshow(bicubic)
        plt.show()

        output_psnr = PSNR_loss(output, real).item()
        running_psnr_output += output_psnr
        print(f"Output of OUR implementation, loss with respect to original {PSNR_loss(output, real).item()}")
        plt.imshow(output.int().cpu())
        plt.show()

        print("This is the real high res image") 
        plt.imshow(real)
        plt.show()

    # 5 is the number of test images
    running_psnr_bicubic /= 5
    running_psnr_output /= 5
    return running_psnr_bicubic, running_psnr_output

#We show the results of the CNN when given the test images in input
#Loading the weights of the CNN
model1 = torch.load(PATH_CNN)
model1.eval()

output_images = [apply_model(model1, imread(path)) for path in low_res_paths]
losses = show_all(low_images, bicubic_images, output_images, real_images)
loss_bicubic, loss_output = losses
print(f"The average loss of the result of the bicubic interpolation (with respect to the real one) on the test set is", loss_bicubic)
print(f"The average loss of the result of the CNN output (with respect to the real one) on the test set is", loss_output)

"""## Results of the GAN architectures"""

#We show the results of the SRGAN when given the test images in input
#Loading the weights of the SRGAN
model2 = torch.load(PATH_GAN1)
model2.eval()

output_images_gan = [apply_model(model2, imread(path)) for path in low_res_paths]
gan_losses = show_all(low_images, bicubic_images, output_images_gan,real_images)

loss_bicubic, loss_output = gan_losses
print(f"The average loss of the result of the bicubic interpolation (with respect to the real one) on the test set is", loss_bicubic)
print(f"The average loss of the result of the Generator output (with respect to the real one) on the test set is", loss_output)

#We show the results of the WGAN when given the test images in input
#Loading the weights of the WGAN
model3 = torch.load(PATH_GAN2)
model3.eval()

output_images_gan2 = [apply_model(model3, imread(path)) for path in low_res_paths]
wgan_losses = show_all(low_images, bicubic_images, output_images_gan2,real_images)

loss_bicubic, loss_output = wgan_losses
print(f"The average loss of the result of the bicubic interpolation (with respect to the real one) on the test set is", loss_bicubic)
print(f"The average loss of the result of the WGAN Generator output (with respect to the real one) on the test set is", loss_output)

"""# Conclusion

## Standard supervised training of the CNN

When trained by us on this very small dataset of patches in a standard image-to-image supervised manner, the CNN described in the paper yields results only slightly better than a simple bicubic upsmapling. So while it is clear from both the decreasing validation loss during training and the examples shown during training that the network is in fact learning and gradually improving its outputs, (thus confirming that our implementation is correct), the results are not really comparable in quality to those of the reference paper [[1]](#biblio). This discrepancy is most likely due to the huge difference in dataset size between us and the authors of the paper, we can only use 800+100 images in total due to training time limitations, yielding 18000 patches at 20 patches per image, of which 16000 are used for training and 2000 for validation, while the authors of the paper used a dataset of 350 thousand images.

### Results of experimenting

The changes that we tried that we [described before](#changes) did not improve the model.

## GANs

As far as the SRGAN is concerned, sadly we were not able to improve the performance of the CNN by fine-tuning it in a GAN Adversarial setting, in fact such training only reduces the quality of the output by introducing artefacts. As GAN training is notoriously very sensible to hyperparameter tuning, given the long runtime of even just a few epochs of training, it was difficult to run many experiments of hyper-parameter tuning, so we did not manage to find the optimal configuration of hyper-parameters that would have yielded an improvement of the model given our dataset.
Instead, using the WGAN we obtained less artifacts in the images and so the results slightly improved even though it's evident that the discriminator still lags behind (by looking at its loss). Increasing the number of iterations where only the discriminator is trained would probably increase the performance, however as said before the computational time required would be excessive in the Google Collab environment.

# Bibliography

<a name='biblio'></a>

- [1]   https://arxiv.org/abs/1609.04802
- [2]   https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html 
- [3]   https://github.com/leftthomas/SRGAN
- [4]   https://medium.com/mlearning-ai/how-to-improve-image-generation-using-wasserstein-gan-1297f449ca75
- [5]   https://towardsdatascience.com/custom-dataset-in-pytorch-part-1-images-2df3152895
"""