Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][build] Create images before build python wheels #17536

Closed
wants to merge 6 commits into from

Conversation

RobertIndie
Copy link
Member

@RobertIndie RobertIndie commented Sep 8, 2022

Motivation

Currently in the release process of the python client, there is no indication that the related images need to be created before building python wheels. This may cause the release manager to build the wheel without using the latest images.

Modifications

  • Call create-images.sh in the build-wheels.sh
  • Add option --skip-create-images to the build-wheels.sh to skip the image creation. The default is not skipped.

Documentation

Check the box below or label this PR directly.

Need to update docs?

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add an option to determine whether to create the images.

@merlimat
Copy link
Contributor

@RobertIndie In theory the images should already be available in Docker hub and it takes a lot of time to rebuild them. Is there any particular image that is missing?

@BewareMyPower
Copy link
Contributor

@merlimat IMO, maintaining the images in DockerHub manually is not a good way to do that, especially because only a few members have the permission. If the release manager forgot to run docker update, an outdated build image could be used.

The basic build image usually installed some dependencies by downloading and building. It should not take much time. But I found it could be speeded up significantly by reducing the output. For example, replace tar xvfz with tar xvz.

@RobertIndie
Copy link
Member Author

@RobertIndie In theory the images should already be available in Docker hub and it takes a lot of time to rebuild them. Is there any particular image that is missing?

For example, the version of OpenSSL is wrong in the pulsar python client 2.10.1. You can see more detail here: #17097 (comment)

I think this is because the latest image was not used in the release 2.10.1 process.

@RobertIndie
Copy link
Member Author

Maybe we can add an option to determine whether to create the images.

I have added an option to skip the image creation. PTAL again, Thanks.

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Sep 14, 2022
Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update wiki/release/release-process.md as well?

@merlimat
Copy link
Contributor

IMO, maintaining the images in DockerHub manually is not a good way to do that, especially because only a few members have the permission. If the release manager forgot to run docker update, an outdated build image could be used.

@BewareMyPower If we add a docker pulsar $IMAGE it will force to use the version that exists on DockerHub

@BewareMyPower
Copy link
Contributor

@merlimat Did you mean docker pull? What I concerned more about is that the image might not be built with the latest Dockerfile because it's uploaded manually. At the moment when uploading images cannot be executed automatically, we'd better support building the base image.

@merlimat
Copy link
Contributor

Creating the images takes a very long time. We should have the images built once, available on Docker hub and cached as needed.

By doing docker pull $IMAGE before using it, we're making sure that when the image is manually updated, the builds will start using it.

@BewareMyPower
Copy link
Contributor

BewareMyPower commented Sep 24, 2022

In theory the images should already be available in Docker hub and it takes a lot of time to rebuild them.

I agree now. I've thought the create-images.sh is similar to building base images for RPM and DEB packages. I took a look just now, this script will create 16 images, which takes too much time.

pulsar-build:manylinux2014-cp37-cp37m-x86_64
pulsar-build:manylinux2014-cp38-cp38-x86_64
pulsar-build:manylinux2014-cp39-cp39-x86_64
pulsar-build:manylinux2014-cp310-cp310-x86_64
pulsar-build:manylinux2014-cp37-cp37m-aarch64
pulsar-build:manylinux2014-cp38-cp38-aarch64
pulsar-build:manylinux2014-cp39-cp39-aarch64
pulsar-build:manylinux2014-cp310-cp310-aarch64
pulsar-build:manylinux_musl-cp37-cp37m-aarch64
pulsar-build:manylinux_musl-cp38-cp38-aarch64
pulsar-build:manylinux_musl-cp39-cp39-aarch64
pulsar-build:manylinux_musl-cp310-cp310-aarch64
pulsar-build:manylinux_musl-cp37-cp37m-x86_64
pulsar-build:manylinux_musl-cp38-cp38-x86_64
pulsar-build:manylinux_musl-cp39-cp39-x86_64
pulsar-build:manylinux_musl-cp310-cp310-x86_64

By doing docker pull $IMAGE before using it, we're making sure that when the image is manually updated, the builds will start using it.

My point is not about whether the local image is latest. I mean the process is complicated and needs some manual operations. We must document that once any of the following files are changed, we must notify a maintainer of apachepulsar DockerHub account to upload the newest image.

  • python-versions.sh, which specifies the supported Python versions and the associated base images
  • manylinux2014/Dockerfile, the base image of Python 3.7 to 3.10 for x86_64 and aarch64 architectures
  • manylinux_musl/Dockerfile, the base image of Python 3.7 to 3.10 for Alpine compatible wheels

If we want to change these Dockerfiles, for example, the xvfz option for tar command takes much more time than xvz for the extra output. Even this minor optimization requires a complicated update steps.

In addition, we should not use such a tag like manylinux_musl-cp310-cp310-x86_64. It should be a part of image name, the tag should be a version.


In conclusion, I agree that we should not add an option to create base images for building Python wheels currently. But we should make the process clear in case when the base image changed.

/cc @merlimat @RobertIndie

@BewareMyPower
Copy link
Contributor

I realized the key point suddenly. #17538 upgrades the OpenSSL version to 1.1.1n for some Dockerfiles, but even it has been merged into the master, it's hard to know:

  • whether the latest images in DockerHub include this change? If yes, how to keep the compatibility with the previous image that might have been overwritten.
  • are these images related to Python clients?

See the related issue: #17097

/cc @merlimat @liliang950210

@RobertIndie
Copy link
Member Author

My point is not about whether the local image is latest. I mean the process is complicated and needs some manual operations. We must document that once any of the following files are changed, we must notify a maintainer of apachepulsar DockerHub account to upload the newest image.

  • python-versions.sh, which specifies the supported Python versions and the associated base images
  • manylinux2014/Dockerfile, the base image of Python 3.7 to 3.10 for x86_64 and aarch64 architectures
  • manylinux_musl/Dockerfile, the base image of Python 3.7 to 3.10 for Alpine compatible wheels

I think we can make these operations happen in CI. If there is a PR merged into the master branch, we can use the CI to check for changes to these files. If there are changes, we can automatically push the latest image to the docker hub

@github-actions
Copy link

The pr had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Oct 28, 2022
@tisonkun
Copy link
Member

Closed as the development of the Python client has been permanently moved to https://github.com/apache/pulsar-client-python.

@tisonkun tisonkun closed this Nov 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants