Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Community Pipelines] Accelerate inference of AnimateDiff by IPEX on CPU #8643

Merged
merged 4 commits into from
Jul 12, 2024

Conversation

ustcuna
Copy link
Contributor

@ustcuna ustcuna commented Jun 20, 2024

Hi, this pipeline aims to speed up the inference of AnimateDiff on Intel Xeon CPUs on Linux. It is much alike the previous one I proposed and merged #6683 for SDXL.

By using this optimized pipeline, we can get about 1.5-2.2 times performance acceleration with BFloat16 on fifth generation of Intel Xeon CPUs, code-named [Emerald Rapids].
It is also recommended to run on Pytorch/IPEX v2.0 and above to get the best performance boost.
The main profits are illustrated as below, which are the same with our previous PR:

  • For Pytorch/IPEX v2.0 and above, it benefits from MHA optimization with Flash Attention and TorchScript mode optimization in IPEX.
  • For Pytorch/IPEX v1.13, it benefits from TorchScript mode optimization in IPEX.

Below are the tables which show the test results for AnimateDiff-Lightning (a model that is a distilled version of AnimateDiff SD1.5 v2, a lightning-fast text-to-video generation model which uses AnimateDiff pipeline) with 1/2/4/8 steps on Intel® Xeon® Platinum 8582C Processor (60cores/socket, 1socket) w/ data type BF16:
image
Could u pls help to review? Thanks!

@ustcuna
Copy link
Contributor Author

ustcuna commented Jun 20, 2024

Hi @patrickvonplaten and @pcuenca, could you pls help to review this PR? since this one uses almost the same optimization methods and is with almost the same code format like the previous one I proposed and merged for SDXL pipeline#6683 . Thanks a lot!

@DN6 DN6 requested a review from a-r-r-o-w July 10, 2024 03:05
Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for working on this and your patience! looks mostly good to me although I'm unable to test on a Xeon CPU at the moment. Have a few suggestions, could you please address them?

examples/community/README.md Outdated Show resolved Hide resolved
examples/community/README.md Outdated Show resolved Hide resolved
ckpt = f"animatediff_lightning_{step}step_diffusers.safetensors"
base = "emilianJR/epiCRealism" # Choose to your favorite base model.

adapter = MotionAdapter().to(device, dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would actually prefer if from_pretrained could be used consistently but this is no problem either

examples/community/pipeline_animatediff_ipex.py Outdated Show resolved Hide resolved
examples/community/pipeline_animatediff_ipex.py Outdated Show resolved Hide resolved
examples/community/pipeline_animatediff_ipex.py Outdated Show resolved Hide resolved
examples/community/pipeline_animatediff_ipex.py Outdated Show resolved Hide resolved
examples/community/pipeline_animatediff_ipex.py Outdated Show resolved Hide resolved
examples/community/pipeline_animatediff_ipex.py Outdated Show resolved Hide resolved
examples/community/pipeline_animatediff_ipex.py Outdated Show resolved Hide resolved
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ustcuna
Copy link
Contributor Author

ustcuna commented Jul 11, 2024

Hi @a-r-r-o-w , thank u so much for the detailed reviewing! So thoughtful and professional that helps improve the code with higher quality.🙂 With the new commit, I almost address all the suggestions you commented, could u pls help to review again? Except for the adapter.from_pretrained one I did not change, since here I was intending to totally keep the same example with the AnimateDiff-Lightning Diffusers Usage ByteDance/AnimateDiff-Lightning. Do we need to change it here to keep consistency? If so, I will change it to adapter.from_pretrained with a new commit. Again, thanks sincerely!

@a-r-r-o-w
Copy link
Member

a-r-r-o-w commented Jul 11, 2024

Thank you applying the review comments and the kind words! The adapter.from_pretrained is a personal style choice and not a problem at all for community pipelines, and so we don't need to change it. The tests also have all passed, which is great and we can probably merge, but one last thing that I wanted to ask was why are we not supporting cross_attention_kwargs and added_cond_kwargs here? Will that require preparing the IP Adapter models, for example, if loaded?

@ustcuna
Copy link
Contributor Author

ustcuna commented Jul 12, 2024

Thank you applying the review comments and the kind words! The adapter.from_pretrained is a personal style choice and not a problem at all for community pipelines, and so we don't need to change it. The tests also have all passed, which is great and we can probably merge, but one last thing that I wanted to ask was why are we not supporting cross_attention_kwargs and added_cond_kwargs here? Will that require preparing the IP Adapter models, for example, if loaded?

Hi @a-r-r-o-w , the reason why we did not support cross_attention_kwargs and added_cond_kwargs here is that when leveraging torch.jit.trace to optimize unet, NoneType can not be inputs to traced functions. Even if we can feed fake inputs for cross_attention_kwargs and added_cond_kwargs to trace unet and succeed, we still need to add a fake input for cross_attention_kwargs and added_cond_kwargs inside call function when they are default None, otherwise errors will occur due to incompatible function arguments. But I do think that would be confused for users to understand...
Thus we decide not to support cross_attention_kwargs and added_cond_kwargs here. Do you have any better suggestions to address this? Looking forward to your opinion! Thx!

@a-r-r-o-w
Copy link
Member

I see, that is insightful. I do not have a deep understanding of torch.jit.trace (or torchscript in general) due to limited experience, but I've encountered NoneType issues in the past, so it is understandable why it might be hard to support those arguments. Optional type hinting could probably help (?) but it will require many diffusers core code changes I believe, which is out of scope, or fake inputs like you mention but it's okay to not do it here as this is an example pipeline.

Thanks for your contribution!

@a-r-r-o-w a-r-r-o-w merged commit 9f963e7 into huggingface:main Jul 12, 2024
8 checks passed
@ustcuna
Copy link
Contributor Author

ustcuna commented Jul 12, 2024

I see, that is insightful. I do not have a deep understanding of torch.jit.trace (or torchscript in general) due to limited experience, but I've encountered NoneType issues in the past, so it is understandable why it might be hard to support those arguments. Optional type hinting could probably help (?) but it will require many diffusers core code changes I believe, which is out of scope, or fake inputs like you mention but it's okay to not do it here as this is an example pipeline.

Thanks for your contribution!

Thanks a lot for your understanding of this situation, which is quit a headache!😀 I would like to thank u again for reviewing and merging the code!

Disty0 pushed a commit to Disty0/diffusers that referenced this pull request Jul 18, 2024
…CPU (huggingface#8643)

* add animatediff_ipex community pipeline

* address the 1st round review comments
sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
…CPU (#8643)

* add animatediff_ipex community pipeline

* address the 1st round review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants