Repository for the paper Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search
An in-browser demo is available here
Clone this repository
git clone https://github.com/galatolofederico/clip-glass && cd clip-glass
Create a virtual environment and install the requirements
virtualenv --python=python3.6 env && . ./env/bin/activate
pip install -r requirements.txt
You can run CLIP-GLaSS
with:
python run.py --config <config> --target <target>
Specifying <config>
and <target>
according to the following table:
Config | Meaning | Target Type |
---|---|---|
GPT2 | Use GPT2 to solve the Image-to-Text task | Image |
DeepMindBigGAN512 | Use DeepMind's BigGAN 512x512 to solve the Text-to-Image task | Text |
DeepMindBigGAN256 | Use DeepMind's BigGAN 256x256 to solve the Text-to-Image task | Text |
StyleGAN2_ffhq_d | Use StyleGAN2-ffhq to solve the Text-to-Image task | Text |
StyleGAN2_ffhq_nod | Use StyleGAN2-ffhq without Discriminator to solve the Text-to-Image task | Text |
StyleGAN2_church_d | Use StyleGAN2-church to solve the Text-to-Image task | Text |
StyleGAN2_church_nod | Use StyleGAN2-church without Discriminator to solve the Text-to-Image task | Text |
StyleGAN2_car_d | Use StyleGAN2-car to solve the Text-to-Image task | Text |
StyleGAN2_car_nod | Use StyleGAN2-car without Discriminator to solve the Text-to-Image task | Text |
If you do not have downloaded the models weights you will be prompted to run ./download-weights.sh
You will find the results in the folder ./tmp
, a different output folder can be specified with --tmp-folder
python run.py --config StyleGAN2_ffhq_d --target "the face of a man with brown eyes and stubble beard"
python run.py --config GPT2 --target gpt2_images/dog.jpeg
This work heavily relies on the following amazing repositories and would have not been possible without them:
- CLIP from openai (included in the folder
clip
) - pytorch-pretrained-BigGAN from huggingface
- stylegan2-pytorch from Adrian Sahlman (included in the folder
stylegan2
) - gpt-2-pytorch from Tae-Hwan Jung (included in the folder
gpt2
)
All their work can be shared under the terms of the respective original licenses.
All my original work (everything except the content of the folders clip
, stylegan2
and gpt2
) is released under the terms of the GNU/GPLv3 license. Copying, adapting and republishing it is not only consent but also encouraged.
If you want to cite use you can use this BibTeX
@article{generating2021,
author={Federico Galatolo. and Mario Cimino. and Gigliola Vaglini},
title={Generating Images from Caption and Vice Versa via CLIP-Guided Generative Latent Space Search},
journal={Proceedings of the International Conference on Image Processing and Vision Engineering},
year={2021},
volume={},
pages={},
publisher={SCITEPRESS - Science and Technology Publications},
doi={10.5220/0010503701660174},
issn={},
}
For any further question feel free to reach me at [email protected] or on Telegram @galatolo