Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do video models dream of electrical sheeps in motion #19

Merged
merged 24 commits into from
Oct 21, 2019
Merged

Conversation

daniel-j-h
Copy link
Member

Work in progress; let's see if this works out 🤗

@daniel-j-h
Copy link
Member Author

Here's where we are right now:

  1. Create a random video with 32 frames of shape CxTxHxW 3x32x112x112
  2. Hook this random tensor up in the computational tree so we can optimize it
  3. Pass this random tensor through the trained video model up to the nth layer
  4. Create a loss maximizing the the nth layer's activations by optimizing the input video

Here are example videos maximizing layer3 and layer4 activations:

ex02-layer3-upscale

ex01-layer4-highlr-upscale

Learnings

  • Can use a very high learning rate e.g. 1e-2
  • Should maximize activations for a single channel / volume, not mean over all
  • Get high-frequency patterns fooling the network but not reasonable for humans

We then looked into regularization terms to get rid of the high-frequency patterns; the total variation regularization loss implemented right now results in very strong checkerboard patterns.

ex02-layer3-regscale-upscale

ex03-layer1-noreg-up

Next actions: figure out better regularization term and look into prior art how other folks handle this.

@daniel-j-h
Copy link
Member Author

Completely different approach now; below are results for maximizing layer2 activations with different learning rates and number of iterations (second one is stronger) - I think this is the way to go 🚀

dream-y

dream-final

More examples from: stem, layer1, layer2, layer3, layer4

dream-0

dream-1

dream-2

dream-3

dream-4


dream-00

dream-01

dream-02

dream-03

dream-04

@sandhawalia
Copy link
Member

Dayum 🔥

@daniel-j-h
Copy link
Member Author

Here are a few examples when we - instead of optimizing all activations - select a specific channel; the activations are of shape NxCxTxHxW; below are a few examples for optimizing with fixed i in

acts[:,i, :, :, :].norm()

dream-17-up
dream-16-up
dream-15-up
dream-14-up
dream-13-up
dream-12-up
dream-11-up
dream-10-up

@daniel-j-h
Copy link
Member Author

Starting with a random tensor instead of a seed video and optimizing all activations:

dream-rnd-up

@daniel-j-h
Copy link
Member Author

Starting from random tensor optimizing specific channels (3 and 6 in this case)

dream-zz-up
dream-z-up

@daniel-j-h
Copy link
Member Author

Bringing back the 3d total variation loss term and scaling it

dream-tvn3-up

seems to get rid of high frequencies in the output which makes it more pleasant to look at 🤗

direct comparison:

@daniel-j-h
Copy link
Member Author

Changing how we normalize the gradients

primer1-up

@daniel-j-h
Copy link
Member Author

Latest version maximizing all channels in layer2

primer1-up

Latest version maximizing channel 6 in layer2

primer1-up

@daniel-j-h
Copy link
Member Author

Here are results for

  • layer2 again
  • maximizing specific TxHxW channel activations
  • resizing to frame size of 248 px on the shorter edge

which barely fits on one of my GTX 1080 TIs:

royal-wedding

royal-wedding

royal-wedding

Input clip from

youtube-dl -f 18 yJbXdOdTaJc

ffmpeg -i yJbXdOdTaJc.mp4 -ss 23:48 -to 23:53 -crf 23 -r 16 -an clips.mp4

@daniel-j-h daniel-j-h force-pushed the lsd branch 2 times, most recently from 2cb8bfa to 7634e4b Compare October 21, 2019 13:03
@daniel-j-h
Copy link
Member Author

Merging this into master as is right now. We can explore more advanced techniques such as dreaming at multiple scales and related in separate pull requests in the future.

@daniel-j-h daniel-j-h deleted the lsd branch October 21, 2019 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants