Given a latent CLIP vector from a text or image input, we want to synthesize a sound that fits semantically. As a start, we optimize a SIREN audio to give us a good CLIP score.
# Install dependencies
pip install git+git://github.com/pollinations/CLIPTranslate
# or
pip install -e .
To develop in colab, I recommend to mount google drive in the notebook and on your laptop and import the project from drive. Copy this colab notebook as a start.