Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xl python inference #261

Merged
merged 4 commits into from
Sep 26, 2023
Merged

Xl python inference #261

merged 4 commits into from
Sep 26, 2023

Conversation

lopez-hector
Copy link
Contributor

  1. Added XL inference capabilities to python script.
    1. Infers XL from model version (could change to flag to mimic swift behavior)
    2. Select dtype for VAE decoder from expected_inputs
    3. Schedulers return either np.ndarray or torch.Tensor so i put some logic to catch this and convert to numpy array
    4. Added _get_add_time_ids()

Retains backwards compatibility.

"A high quality photo of a surfing dog"
num_inference_steps = 25
seed = 93
guidance_scale = 10

image

Thank you for your interest in contributing to Core ML Stable Diffusion! Please review CONTRIBUTING.md first. If you would like to proceed with making a pull request, please indicate your agreement to the terms outlined in CONTRIBUTING.md by checking the box below. If not, please go ahead and fork this repo and make your updates.

We appreciate your interest in the project!

Do not erase the below when submitting your pull request:
#########

  • [ X] I agree to the terms outlined in CONTRIBUTING.md

@atiorh
Copy link
Collaborator

atiorh commented Sep 24, 2023

Thanks @lopez-hector! I ran a quick test and received the following error:

File "ml-stable-diffusion/python_coreml_stable_diffusion/coreml_model.py", line 71, in _verify_inputs
    raise TypeError(
TypeError: Expected shape (2, 6), got (12,) for input: time_ids

@lopez-hector
Copy link
Contributor Author

Interesting, I don't get this issue when using the converted packages from https://huggingface.co/apple/coreml-stable-diffusion-xl-base/tree/main/packages. I just redownloaded them and tested.

Using these packages from hugging face the expected inputs for the UNET are:

{'encoder_hidden_states': {'dtype': <class 'numpy.float16'>,
                           'shape': (2, 2048, 1, 77)},
 'sample': {'dtype': <class 'numpy.float16'>, 'shape': (2, 4, 128, 128)},
 'text_embeds': {'dtype': <class 'numpy.float16'>, 'shape': (2, 1280)},
 'time_ids': {'dtype': <class 'numpy.float16'>, 'shape': (12,)},
 'timestep': {'dtype': <class 'numpy.float16'>, 'shape': (2,)}}

Im not sure how those were converted before being uploaded to HF. What are the expected inputs for your converted packages?

@atiorh
Copy link
Collaborator

atiorh commented Sep 25, 2023

I see! I am currently testing models exported from #227 and the different input shapes are handled on the Swift code path but not on the Python code path. cc: @ZachNagengast

@atiorh
Copy link
Collaborator

atiorh commented Sep 25, 2023

@lopez-hector Do you mind adding a similar support for accepting both input shapes on the Python code path?

@atiorh
Copy link
Collaborator

atiorh commented Sep 25, 2023

@lopez-hector There is one more thing I was thinking about improving about the Python inference code path which is to switch to coremltools.models.CompiledMLModel from coremltools.models.MLModel so that the results from model compilation (which bloats the loading time from seconds to minutes) are cached during first load and not recompiled after each load. If you would like, we can fold those changes into this PR or I can make a separate one. Please let me know.

@lopez-hector
Copy link
Contributor Author

Will add support for input shapes soon.

Happy to include the change for CompileMLModel here. I was hoping there was a way to speed that up!

@atiorh
Copy link
Collaborator

atiorh commented Sep 25, 2023

@lopez-hector Thanks!

@atiorh
Copy link
Collaborator

atiorh commented Sep 25, 2023

Note that the CompiledMLModel will need to load the .mlmodelc assets generated via --bundle-resources-for-swift-cli instead of the original .mlpackage assets so the different filename conventions will need to be handled, just a heads-up

@lopez-hector
Copy link
Contributor Author

lopez-hector commented Sep 25, 2023

I see! I am currently testing models exported from #227 and the different input shapes are handled on the Swift code path but not on the Python code path. cc: @ZachNagengast

Added support for (12,) and (2, 6) shape. Just tested it using #227 conversion script. It was able to handle (2,6) time_ids shape.

@lopez-hector
Copy link
Contributor Author

Hi @atiorh, I added support for the CompiledMLModel. On my M2 Max MacBook, it reduces loading time by half for the text encoders and VAE. About 3/4 reduction for the UNET.

UNET takes about 15s to load from compiled sources.

  • added logic to detect mlpackage or mlmodelc from the input directory.
  • added flag --model-sources to force loading from either packages or compiled sources
  • If loading from .mlmodelc, we load using CompiledMLModel and generate expected_inputs from metadata.json

@atiorh
Copy link
Collaborator

atiorh commented Sep 26, 2023

Excellent, thanks @lopez-hector!

@atiorh atiorh merged commit f3a2124 into apple:main Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants