Xl python inference #261

lopez-hector · 2023-09-23T14:39:56Z

Added XL inference capabilities to python script.
1. Infers XL from model version (could change to flag to mimic swift behavior)
2. Select dtype for VAE decoder from expected_inputs
3. Schedulers return either np.ndarray or torch.Tensor so i put some logic to catch this and convert to numpy array
4. Added _get_add_time_ids()

Retains backwards compatibility.

"A high quality photo of a surfing dog"
num_inference_steps = 25
seed = 93
guidance_scale = 10

Thank you for your interest in contributing to Core ML Stable Diffusion! Please review CONTRIBUTING.md first. If you would like to proceed with making a pull request, please indicate your agreement to the terms outlined in CONTRIBUTING.md by checking the box below. If not, please go ahead and fork this repo and make your updates.

We appreciate your interest in the project!

Do not erase the below when submitting your pull request:
#########

[ X] I agree to the terms outlined in CONTRIBUTING.md

atiorh · 2023-09-24T05:08:10Z

Thanks @lopez-hector! I ran a quick test and received the following error:

File "ml-stable-diffusion/python_coreml_stable_diffusion/coreml_model.py", line 71, in _verify_inputs
    raise TypeError(
TypeError: Expected shape (2, 6), got (12,) for input: time_ids

lopez-hector · 2023-09-24T20:06:52Z

Interesting, I don't get this issue when using the converted packages from https://huggingface.co/apple/coreml-stable-diffusion-xl-base/tree/main/packages. I just redownloaded them and tested.

Using these packages from hugging face the expected inputs for the UNET are:

{'encoder_hidden_states': {'dtype': <class 'numpy.float16'>,
                           'shape': (2, 2048, 1, 77)},
 'sample': {'dtype': <class 'numpy.float16'>, 'shape': (2, 4, 128, 128)},
 'text_embeds': {'dtype': <class 'numpy.float16'>, 'shape': (2, 1280)},
 'time_ids': {'dtype': <class 'numpy.float16'>, 'shape': (12,)},
 'timestep': {'dtype': <class 'numpy.float16'>, 'shape': (2,)}}

Im not sure how those were converted before being uploaded to HF. What are the expected inputs for your converted packages?

atiorh · 2023-09-25T18:04:59Z

I see! I am currently testing models exported from #227 and the different input shapes are handled on the Swift code path but not on the Python code path. cc: @ZachNagengast

atiorh · 2023-09-25T18:05:26Z

@lopez-hector Do you mind adding a similar support for accepting both input shapes on the Python code path?

atiorh · 2023-09-25T18:35:22Z

@lopez-hector There is one more thing I was thinking about improving about the Python inference code path which is to switch to coremltools.models.CompiledMLModel from coremltools.models.MLModel so that the results from model compilation (which bloats the loading time from seconds to minutes) are cached during first load and not recompiled after each load. If you would like, we can fold those changes into this PR or I can make a separate one. Please let me know.

lopez-hector · 2023-09-25T19:12:36Z

Will add support for input shapes soon.

Happy to include the change for CompileMLModel here. I was hoping there was a way to speed that up!

atiorh · 2023-09-25T19:30:25Z

@lopez-hector Thanks!

atiorh · 2023-09-25T19:31:37Z

Note that the CompiledMLModel will need to load the .mlmodelc assets generated via --bundle-resources-for-swift-cli instead of the original .mlpackage assets so the different filename conventions will need to be handled, just a heads-up

lopez-hector · 2023-09-25T19:33:45Z

I see! I am currently testing models exported from #227 and the different input shapes are handled on the Swift code path but not on the Python code path. cc: @ZachNagengast

Added support for (12,) and (2, 6) shape. Just tested it using #227 conversion script. It was able to handle (2,6) time_ids shape.

lopez-hector · 2023-09-26T04:59:15Z

Hi @atiorh, I added support for the CompiledMLModel. On my M2 Max MacBook, it reduces loading time by half for the text encoders and VAE. About 3/4 reduction for the UNET.

UNET takes about 15s to load from compiled sources.

added logic to detect mlpackage or mlmodelc from the input directory.
added flag --model-sources to force loading from either packages or compiled sources
If loading from .mlmodelc, we load using CompiledMLModel and generate expected_inputs from metadata.json

atiorh · 2023-09-26T20:10:44Z

Excellent, thanks @lopez-hector!

lopez-hector added 2 commits September 22, 2023 18:27

Updated pipeline.py for XL inference

a507c1c

cleaned up

8057853

lopez-hector added 2 commits September 25, 2023 12:35

Add shape handling for UNET time_ids shape

b7744bd

added support for loading from CompiledMLModel

34e3696

atiorh approved these changes Sep 26, 2023

View reviewed changes

atiorh merged commit f3a2124 into apple:main Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xl python inference #261

Xl python inference #261

lopez-hector commented Sep 23, 2023

atiorh commented Sep 24, 2023

lopez-hector commented Sep 24, 2023

atiorh commented Sep 25, 2023

atiorh commented Sep 25, 2023

atiorh commented Sep 25, 2023

lopez-hector commented Sep 25, 2023

atiorh commented Sep 25, 2023

atiorh commented Sep 25, 2023

lopez-hector commented Sep 25, 2023 •

edited

Loading

lopez-hector commented Sep 26, 2023

atiorh commented Sep 26, 2023

Xl python inference #261

Xl python inference #261

Conversation

lopez-hector commented Sep 23, 2023

atiorh commented Sep 24, 2023

lopez-hector commented Sep 24, 2023

atiorh commented Sep 25, 2023

atiorh commented Sep 25, 2023

atiorh commented Sep 25, 2023

lopez-hector commented Sep 25, 2023

atiorh commented Sep 25, 2023

atiorh commented Sep 25, 2023

lopez-hector commented Sep 25, 2023 • edited Loading

lopez-hector commented Sep 26, 2023

atiorh commented Sep 26, 2023

lopez-hector commented Sep 25, 2023 •

edited

Loading