Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a script to update dockers.json if necessary #273

Merged
merged 19 commits into from
Jan 10, 2022

Conversation

VJalili
Copy link
Member

@VJalili VJalili commented Dec 18, 2021

This PR implements a script to update the dockers.json file with the images with a given tag if an image with that tag exists in the container registry that is different from the image already referenced in the dockers.json

The intent of this script is to update dockers.json with the images rebuilt and pushed by build_docker.py. Since for
a given list of images, build_docker.py may rebuild additional images, the list of all the built and pushed images may be different from the user-requested list. However, the build_docker.py does not export the list of all the built images, hence, this script takes a brute-force approach and checks for all the images if they are updated by a given tag (i.e., the tag used by the build_docker.py to tag all the updated images).

An output of this script reads as the following:

$ python scripts/docker/update_dockers_json.py input_values/dockers.json lint-1d3af92
Checking 21 images with the tag `lint-1d3af92`.
The result of asserting every image in the provided JSON file will be reported according to the following legend.

★	The image tag is identical to the tag in the provided JSON file, OR the tag is different but an image with the given tag does NOT exist in the container registry; hence the image listed in the JSON file will remain unchanged.

✔	The image tag is different from the tag in the provided JSON file, and an image with the given tag exists in the container registry; hence the provided JSON file will be updated to reference this image.

✘	There was an error querying the image from the image registry.

[1/21]	✔	cnmops_docker
[2/21]	★	condense_counts_docker
[3/21]	★	delly_docker
[4/21]	★	gatk_docker
[5/21]	★	gatk_docker_pesr_override
[6/21]	★	genomes_in_the_cloud_docker
[7/21]	✘	linux_docker	Get "https://marketplace.gcr.io/v2/google/ubuntu1804/manifests/lint-1d3af92": denied: Token exchange failed for project 'cloud-marketplace'. Caller does not have permission 'storage.buckets.get'. To configure permissions, follow instructions at: https://cloud.google.com/container-registry/docs/access-control
[8/21]	★	manta_docker
[9/21]	★	melt_docker
[10/21]	★	samtools_cloud_docker
[11/21]	★	sv_base_docker
[12/21]	★	sv_base_mini_docker
[13/21]	★	sv_pipeline_base_docker
[14/21]	★	sv_pipeline_docker
[15/21]	★	sv_pipeline_qc_docker
[16/21]	★	sv_pipeline_rdtest_docker
[17/21]	★	wham_docker
[18/21]	★	igv_docker
[19/21]	★	duphold_docker
[20/21]	★	vapor_docker
[21/21]	★	cloud_sdk_docker

Copy link
Contributor

@TedBrookings TedBrookings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of the program seems sound to me. I've made several aesthetic suggestions. The only things I'd say are true "problems" are the two bits just after the argument parsing where I've made suggestions for raising Exceptions.

scripts/docker/update_dockers_json.py Outdated Show resolved Hide resolved
scripts/docker/update_dockers_json.py Outdated Show resolved Hide resolved
scripts/docker/update_dockers_json.py Outdated Show resolved Hide resolved
scripts/docker/update_dockers_json.py Show resolved Hide resolved
scripts/docker/update_dockers_json.py Outdated Show resolved Hide resolved
scripts/docker/update_dockers_json.py Outdated Show resolved Hide resolved
scripts/docker/update_dockers_json.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @VJalili this is great! I have a couple of suggestions relating to design.

While testing a new tag mw-gnomad-superscale-dev-ca70fa3 I noticed that the script successfully found 3/4 of the images I was expecting. The one it did not update was the following:

"sv_pipeline_docker" : "us.gcr.io/broad-dsde-methods/eph/sv-pipeline:eph_hotfix_no_evidence-1f461ed",

when I was expecting it to be

"sv_pipeline_docker" : "us.gcr.io/broad-dsde-methods/markw/sv-pipeline:mw-gnomad-superscale-dev-ca70fa3",

the reason apparently being that these two images reside in different repositories: us.gcr.io/broad-dsde-methods/eph and us.gcr.io/broad-dsde-methods/markw. This won't be an issue with CI/CD since we can consistently use us.gcr.io/broad-dsde-methods/gatk-sv. But for use outside CI/CD, can you add an optional parameter --repo so users can specify this manually? Also a line in the documentation explaining this would be helpful.

I also think it would be good practice to only look at dockers that can be built with build_docker.py, ie exclude gatk_docker, gatk_docker_pesr_override, genomes_in_the_cloud_docker, linux_docker, cloud_sdk_docker. Rather than hard-coding this, allow this to be a list parameter with these as the default. This will cut down a little on the run time and also prevent any accidental collisions with third-party docker tags (though unlikely in CI/CD, I could image someone running this with a generic tag like latest which could be a problem).

scripts/docker/update_dockers_json.py Outdated Show resolved Hide resolved
scripts/docker/update_dockers_json.py Outdated Show resolved Hide resolved
scripts/docker/update_dockers_json.py Outdated Show resolved Hide resolved
scripts/docker/update_dockers_json.py Outdated Show resolved Hide resolved
scripts/docker/update_dockers_json.py Outdated Show resolved Hide resolved
@VJalili
Copy link
Member Author

VJalili commented Jan 7, 2022

Thank you @TedBrookings and @mwalker174, I guess the code reads and works better now.

I also think it would be good practice to only look at dockers that can be built with build_docker.py, ie exclude gatk_docker, gatk_docker_pesr_override, genomes_in_the_cloud_docker, linux_docker, cloud_sdk_docker. Rather than hard-coding this, allow this to be a list parameter with these as the default. This will cut down a little on the run time and also prevent any accidental collisions with third-party docker tags (though unlikely in CI/CD, I could image someone running this with a generic tag like latest which could be a problem).

Very good suggestion, thank you. I added the --exclude-images argument accordingly.

But for use outside CI/CD, can you add an optional parameter --repo so users can specify this manually? Also a line in the documentation explaining this would be helpful.

I added the line in the documentation. But I am not sure if it is clear to me how the --repo option you suggested should work, and I'd appreciate it if you could please elaborate. For instance, in the use case you provided where 1/4 of the images were not detected, I imagine using the --repo option it could search for images under the broad-dsde-methods/markw repo. However, it will use that repo for all the other images, hence it will fail for the other 3/4 images you mentioned. Should the --repo argument take a dictionary as input (e.g., {"sv-pipeline": "broad-dsde-methods/markw"}) and use the kvp in the dict to the specified images and use the repo/registery as given in the docker.json for those not specified in the --repo?

@TedBrookings
Copy link
Contributor

TedBrookings commented Jan 7, 2022

I have a thought on the idea of a --repo argument: Since this script is meant to work in tandem with build_docker.py, I think that can suggest the action of --repo. Specifically, build_docker.py will push all its updated images to one particular repo, regardless of where the previous images were. So I think --repo could just provide an extra place to check for updated images. So the image location resolution could go like this

  1. Check if updated tag exists in the same repo as is currently used in dockers.json (what the script does now). If so, use that.
  2. If not, check if arguments.repo is not None and updated tag exists for this image in arguments.repo. If so, use that.
  3. If not, check if current image exists in same repo as is currently used in dockers.json (what the script does now). If so, image wasn't updated.
  4. Otherwise it's an error (what the script does now).

Obviously slightly more complicated, but I think not hugely so, and it would help migrate from our current situation where all the dockers are in different repos. One step greater complexity would be to allow arguments.repo to be a list and check all the repos. I don't think that's necessary, because build_docker.py should push all the changes to one place.

@mwalker174
Copy link
Collaborator

For instance, in the use case you provided where 1/4 of the images were not detected, I imagine using the --repo option it could search for images under the broad-dsde-methods/markw repo. However, it will use that repo for all the other images, hence it will fail for the other 3/4 images you mentioned.

As @TedBrookings mentioned, if you use build_docker.py, all images will be pushed to the same repo. So actually in that case all 4 updated images would be found.

  1. Check if updated tag exists in the same repo as is currently used in dockers.json (what the script does now). If so, use that.
  2. If not, check if arguments.repo is not None and updated tag exists for this image in arguments.repo. If so, use that.

Again I can't really think of a case where there would be a set of updated images with the same tag scattered across different repos. I think it would be safer and clearer to make --repo required, which would also make it faster by not having to check multiple repos.

@TedBrookings
Copy link
Contributor

I agree with what Mark just said. One last thing: My build_docker.py changes will introduce --docker-repo as an argument, in part because github repos are a thing, and build_docker.py needs to at least deal with them (even though the github repo is hard-coded) so I wanted to avoid ambiguity. I don't think it's a huge deal either way, but it wouldn't hurt to make the argument --docker-repo.

@VJalili
Copy link
Member Author

VJalili commented Jan 8, 2022

Thanks for elaborating on that. Please check the updated code if it implements the repo argument as suggested.

@TedBrookings
Copy link
Contributor

I'm happy with how the --repo argument is handled, although I will still prefer it be named --docker-repo for the reasons I listed above.

Copy link
Contributor

@TedBrookings TedBrookings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concerns have been satisfied, I'm fine with the current state.

@VJalili
Copy link
Member Author

VJalili commented Jan 10, 2022

Sure, I refactored the argument from repo to docker_repo. I would also prefer --docker-repo over docker_repo; however, conventionally the -- prefix and - for separating words is used for optional arguments, and the required arguments are generally made positional, without the -- prefix and _ is used for separating words.

@VJalili VJalili merged commit 6437798 into broadinstitute:master Jan 10, 2022
@VJalili VJalili deleted the update_dockers_json branch January 10, 2022 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants