-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latte: Latent Diffusion Transformer for Video Generation #8404
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some feedbacks!
thanks!
@@ -45,6 +45,7 @@ | |||
) | |||
from .kandinsky3 import Kandinsky3Img2ImgPipeline, Kandinsky3Pipeline | |||
from .latent_consistency_models import LatentConsistencyModelImg2ImgPipeline, LatentConsistencyModelPipeline | |||
from .latte import LattePipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from .latte import LattePipeline |
don't need this import here:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed it.
Hi, @yiyixuxu thanks for your code review. I've removed some unnecessary codes from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finished my code review. In summary, the unused code in latte_transformer_3d.py
was removed, the norm_type_latte
was removed and a flag squeeze_hidden_states
was added
|
update _toctree.yml for docs and fix example
|
Hi @maxin-cn. Seems like the Latte tests are still broken but atleast quality ones passed! Fast PyTorch Pipeline CPU tests - this test fails in the pipeline PyTorch Example CPU tests - this test fails due to an unrelated error and you don't need to do anything on your end I think. Also, with your latest commit |
Also, you might be interested in 4/8-bit quantization for the entire model or memory optimization for the T5 text encoder. I am trying a few things here. Presently, inference is extremely slow due to the overhead involved but if you have any insights experimenting on the same, it would be awesome to know! As such, float16 inference runs in < 17 GB for 16-frame video but I'm interested in bringing that down to 8-10 GB. |
Hi @a-r-r-o-w , I have fixed the |
|
Thanks for your suggestions. I did not try some quantization techniques at this moment. BTW, is this quantitative reasoning necessary to merge this PR? If not, I want to integrate your code into subsequent inference code after merging Latte's PR. |
No, quantization is not necessary at all. This PR looks absolutely great to merge now and once CI passes, we can do it. |
Okay. Let's try quantization later. |
Thank you for bearing with our reviews/requests over the duration of this PR, and being so quick to respond! This is very cool work ❤️ LGTM! 🤗 |
I would like to extend my heartfelt thanks for your support and responsiveness throughout the duration of this PR! |
Hi @a-r-r-o-w ! I have added your code for quantization inference. You can find it at here. Thank you very much. |
* add Latte to diffusers * remove print * remove print * remove print * remove unuse codes * remove layer_norm_latte and add a flag * remove layer_norm_latte and add a flag * update latte_pipeline * update latte_pipeline * remove unuse squeeze * add norm_hidden_states.ndim == 2: # for Latte * fixed test latte pipeline bugs * fixed test latte pipeline bugs * delete sh * add doc for latte * add licensing * Move Transformer3DModelOutput to modeling_outputs * give a default value to sample_size * remove the einops dependency * change norm2 for latte * modify pipeline of latte * update test for Latte * modify some codes for latte * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * modify for Latte pipeline * video_length -> num_frames; update prepare_latents copied from * make fix-copies * make style * typo: videe -> video * update * modify for Latte pipeline * modify latte pipeline * modify latte pipeline * modify latte pipeline * modify latte pipeline * modify for Latte pipeline * Delete .vscode directory * make style * make fix-copies * add latte transformer 3d to docs _toctree.yml * update example * reduce frames for test * fixed bug of _text_preprocessing * set num frame to 1 for testing * remove unuse print * add text = self._clean_caption(text) again --------- Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: YiYi Xu <[email protected]> Co-authored-by: Aryan <[email protected]> Co-authored-by: Aryan <[email protected]>
What does this PR do?
Add Latte to
diffusers
. Please see this issue .Fixes # (issue)
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.