This is a PyTorch implementation of the paper 'Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer' by Wang et al.
$ git clone https://github.com/FeliMe/multimodal_style_transfer.git
if you just want to use the network with the pretrained models, open 'transform_image.ipynb' (or 'transform_video.ipynb'), select the model and an image from the /images folder (or use your own) and run the notebook.
If you want to train your own model on a styles image, you first need to download the MS COCO Dataset, store it in a folder named "/coco/" in the same directory where you cloned this project to. Then use 'train_multimodal.ipynb'. You might need to adapt the STYLE_WEIGHTS depending on you style image.
Style | Output Style Subnet | Output Enhance Subnet | Output Refine Subnet | |
---|---|---|---|---|
Patch | ||||
Scream | ||||
Still Life | ||||
Mixed |
There are some deviations from the original paper in this implementation. I chose a different layer for the content representation as it generated better results. I also added a regularization loss as in Johnson et al..
I removed the final bilinear upsampling to size 1024 as it was only inserted during test time anyway and thus resulted in an image which didn't have higher effective resolution but was just an upsampled version of the previous image.
- Some code is based on the fast neural style transfer implementation by CeShine Lee
- Thanks to the TUM Computer Vision Group for granting me access to their hardware.