diff --git a/website/docs/advanced/docker/deploy/incremental.md b/website/docs/advanced/docker/deploy/incremental.md index 46ed7b3cb..95898bc0c 100644 --- a/website/docs/advanced/docker/deploy/incremental.md +++ b/website/docs/advanced/docker/deploy/incremental.md @@ -51,10 +51,30 @@ the commit SHAs are determined automatically. In CI/CD, the commit SHAs are determined as the following example. -``` - X---Y---Z feature branch - / \ -A---B---C---D---E main branch +```mermaid +%%{init: { + 'logLevel': 'debug', + 'gitGraph': {'rotateCommitLabel': false}, + 'themeVariables': { 'commitLabelFontSize': '22px' } + } + }%% +gitGraph + commit id: "A" + commit id: "B" + branch feature + checkout feature + commit id: "X" + checkout main + commit id: "C" + checkout feature + commit id: "Y" + checkout main + commit id: "D" + checkout feature + commit id: "Z" + checkout main + merge feature id: "E" + commit id: "F" ``` In this example, `BASE_SHA=B`, `HEAD_SHA=Z`, and `E` is the merge commit. @@ -62,16 +82,19 @@ In this example, `BASE_SHA=B`, `HEAD_SHA=Z`, and `E` is the merge commit. ## Identifying Images Requiring Rebuilding from Changed Files -The build_docker script identifies the list of docker images +The `build_docker` script identifies the list of docker images that need to be rebuilt based on two factors. -Firstly, directly impacted images are determined by examining the + +1. Directly impacted images are determined by checking the list of files each image depends on. If any of these files have -been changed, the corresponding image requires rebuilding. -Secondly, indirectly impacted images are determined based on -the hierarchical dependency between images. If an image is -built upon another image, and the base image is being rebuilt, -then the dependent image also needs to be rebuilt. This two-step -process ensures that all the affected images are correctly +changed, the corresponding image needs rebuilding. + +2. Indirectly impacted images are identified based on +the hierarchical dependency between images. +If a base image is rebuilt, any dependent images built upon +it also require rebuilding. + +This two-step process ensures that all the affected images are correctly identified for rebuilding. diff --git a/website/docs/advanced/docker/deploy/manual.md b/website/docs/advanced/docker/deploy/manual.md index 9cf2c899c..896985e44 100644 --- a/website/docs/advanced/docker/deploy/manual.md +++ b/website/docs/advanced/docker/deploy/manual.md @@ -4,11 +4,310 @@ description: Build and Publish Images sidebar_position: 3 --- +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + If you are contributing to the GATK-SV codebase, specifically focusing on enhancing tools, configuring dependencies in Dockerfiles, or modifying GATK-SV scripts within the Docker images, it is important to build and test the Docker images locally. -This ensures that the images are successfully built and function as intended. -Additionally, if you wish to host the images in your own container registry, -you will need to follow these steps. -To simplify the build process, we have developed a Python script -that automates the image building, and publishing to your container registry. +This ensures that the images are successfully built and function as intended. + +The process of updating GATK-SV Docker images involves two steps: build and publish. + +- **Build**: Create Docker images from Dockerfiles and store them on your computer. + +- **Publish**: Upload the built Docker images to container registries +(e.g., Google Container registry, or Azure container registry) +to make them available for use in Terra or Cromwell. + +You may refer to [this page](/docs/advanced/docker/index.md) for detailed description of the process. +To streamline the process, we have developed a Python script +that automates the image building and publishing to your container registry. +This section provides guidelines on building and publishing the images using this script. + + +:::warning Linux Machine Required + +Only Linux machines (dedicated or virtual) are supported for building GATK-SV Docker images. +Images created on non-Linux machines may not work with Terra or Cromwell execution environment. +The instructions provided on this page assume you are using a Linux Ubuntu machine. +::: + + + +## Setup + +### Runtime environment {#runtime} + +Currently, GATK-SV Docker images can only be built on the `linux/amd64` platform, +which is a machine running Linux OS on x86-64 architecture. +Images build on Apple M1 (`linux/arm64`) are not currently supported. +You can use a local Linux machine or obtain a virtual machine from a cloud platform. + +You may follow the steps in the +[GCP](https://cloud.google.com/compute/docs/instances/create-start-instance#publicimage) +or [Azure](https://learn.microsoft.com/en-us/azure/virtual-machines/windows/quick-create-portal) +documentation to create a virtual machine (VM) on Google Cloud Platform (GCP) or Microsoft Azure respectively. +Make sure the VM is built using an Ubuntu image, has at least 8 GB RAM, and some additional +disk space (e.g., 50 GB should be sufficient). + +Building and publishing GATK-SV Docker images is time-consuming and can take around 1 hour. +Therefore, we recommend using a terminal multiplexer +(e.g., [tmux](https://github.com/tmux/tmux/wiki/Getting-Started); +[tmux cheat sheet](https://tmuxcheatsheet.com)) +when running on a VM to ensure the process continues even if you are disconnected from the VM. + +### Docker {#docker} + +[Install](https://docs.docker.com/engine/install/) Docker desktop +and login using `sudo docker login`. If utilizing GATK-SV Docker images +from a private container registry or intending to publish the resulting +images to a registry, ensure that you are logged in with credentials +that grant you access to the registry. + + + + + You may follow + [this documentation](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-authentication?tabs=azure-cli) + on setting up Docker authentication to an Azure container registry. + + + + You may follow + [this documentation](https://cloud.google.com/artifact-registry/docs/docker/authentication) + on setting up Docker authentication to a Google container registry. + + + + +### Checkout codebase {#checkout} + +Make sure you are on the `git` branch with the code you want to add +to the GATK-SV Docker images you are building. + +```shell +git fetch origin +git checkout origin/ +``` + +## Build and Publish Docker Images {#build} + +All the GATK-SV Dockerfiles are hosted under the directory +[`gatk-sv/dockerfiles/`](https://github.com/broadinstitute/gatk-sv/tree/main/dockerfiles). +While you can build the GATK-SV Docker images by following the standard +[Docker image build procedures](https://docs.docker.com/engine/reference/commandline/image_build/), +that can be challenging due to the nested hierarchy of GATK-SV Docker images. +To simplify the process, we have developed a utility script that streamlines the +Docker image build process +([`scripts/docker/build_docker.py`](https://github.com/broadinstitute/gatk-sv/blob/main/scripts/docker/build_docker.py)). + +In the following, we will explain how to use the utility script for a simple use-case. +For more advanced and additional functionalities, please refer to the script's documentation, +which you may access it as the following. + +```shell +python scripts/docker/build_docker.py --help +``` + + +In its basic setup, you can use the following command to **build and publish** a GATK-SV Docker image. + +```shell +python scripts/docker/build_docker.py \ + --targets \ + --image-tag \ + --docker-repo +``` + +The arguments used are explained in the following. + +### Determine which images need to be rebuilt {#targets} + +You may follow either of the following practices to determine which images to rebuild. + +- **Automatic:** + The script can automatically determine which Docker images need a rebuild + based on a list of changed files and cross-referencing them with the + table in [this section](/docs/advanced/docker/images#list). + Specifically, it takes two git commit SHAs as input, uses `git diff` + to extract the list of changed files, and then cross-referencing them + with [this table](/docs/advanced/docker/images#list) to identify the Docker + images requiring rebuilding. Details can be found on [this page](/docs/advanced/docker/deploy/incremental.md). + To use this feature, commit the changes first, identify `BASE_SHA` and `HEAD_SHA` using `git log` or GitHub + (details on [this page](/docs/advanced/docker/deploy/incremental.md)), + and then call the script as follows. + + ```shell + python scripts/docker/build_docker.py \ + --base-git-commit BASE_SHA \ + --current-git-commit HEAD_SHA + ``` + +- **Manual: ** + You may refer to the table in [this section](/docs/advanced/docker/images#list) + to determine which Docker images to rebuild based on the changed files. + For instance, if you modified any of the files under the + [`gatk-sv/src/svtk/`](https://github.com/broadinstitute/gatk-sv/tree/main/src/svtk) + directory, you will need to rebuild the `sv-pipeline` Docker image. + You can set the list of images to rebuild using the `--targets` argument. + For instance: + + ```shell + python scripts/docker/build_docker.py \ + --targets sv-pipeline + ``` + + You may specify multiple images to rebuild by providing a list of their names. + For instance, the following command builds the `sv-pipeline` and the `str` Docker images. + + ```shell + python scripts/docker/build_docker.py \ + --targets sv-pipeline str + ``` + +Please note that `--targets` and `--base-git-commit --current-git-commit` +options are mutually exclusive. In other words, you can either manually specify +images to rebuild, or let the script determine them. +Combining or avoiding both options is not currently supported. + +:::info +Following the steps above, the script builds the specified Docker images +_and all the images derived from them_, ensuring proper propagation of changes through the pipeline. +If you want to build only the specified images, you would need to add the `--skip-dependent-images` flag. +::: + + +### Image tag {#tag} + +[Docker image tags](https://docs.docker.com/engine/reference/commandline/tag/) +are used to distinguish between different builds of the same image. +You can use any naming convention for your tags. +GATK-SV docker images use the following template for tags, +which you may want to adopt, in particular, if you plan to publish +your images on the GATK-SV container registries. + +``` +[Date]-[Release Tag]-[Head SHA 8] +``` + +where `[Date]` is `YYYY-MM-DD` extracted from the time stamp of the last +commit on the feature branch, `[Release Tag]` is extracted from the latest [pre-]release on GitHub, +and the `[Head SHA 8]` is the first eight letters of the SHA of the +last commit on the feature branch. + +For example: + +``` +2023-07-28-v0.28.1-beta-e70dfbd7 +``` + +For automatically composing image tags, you may follow the practices +used in [GATK-SV CI/CD](https://github.com/broadinstitute/gatk-sv/blob/286a87f3bcfc0b8c811ff789776dd0b135f582e9/.github/workflows/sv_pipeline_docker.yml#L85-L109). + + + +### Specify the container registry {#registry} +The built images are stored on your computer. If you are only developing +or testing locally, there is no need to push them to a container registry. +In this case you can avoid providing `--docker-repo `. + +You need to push the images to a container registry if you want to: + +- Use the updated Docker images for WDL testing or development; +- Store them on a container registry other than those maintained by the GATK-SV team. + +The script automatically pushes Docker images to a container registry. +To use this feature, you may follow these steps: + +1. Ensure you are logged into Docker with credentials granting +push access to the container registry. Please refer to the +[Docker](#docker) section for details. + + +2. Provide the `--docker-repo ` argument, +replacing `` with the name of your container registry. +For Google Container Registry (GCR) and Azure Container Registry (ACR), +the format is generally as follows. + + + + + Template: + + ```shell + .azurecr.io//: + ``` + + Example: + ```shell + python scripts/docker/build_docker.py \ + --targets sv-pipeline + --tag v1 + --docker-repo myregistry.azurecr.io/gatk-sv + ``` + + which results in creating the following image: + + ```shell + myregistry.azurecr.io/gatk-sv/sv-pipeline:v1 + ``` + + + + + Template: + + ```shell + //: + ``` + + Example: + ```shell + python scripts/docker/build_docker.py \ + --targets sv-pipeline + --tag v1 + --docker-repo us.gcr.io/my-repository/gatk-sv + ``` + + which results in creating the following image: + + ```shell + us.gcr.io/my-repository/gatk-sv/sv-pipeline:v1 + ``` + + + + +Please note that we are currently using GCR, but it has been migrated to Google Artifact Registry. + + + +## Post-build + +- GATK-SV docker images are mainly intended for use in WDLs. + Therefore, it's a good practice to test the newly updated + images in related WDLs. This ensures that the updated images function + as expected within specific workflows. + +- If you were using a Linux VM to build the Docker images, + ensure you either stop or delete the VM after building the images. + Stopping the VM won't delete the disk, and you'll continue to + incur disk usage charges. If you don't want to incur disk costs, + you can delete the VM along with all its associated resources. + Stopping is preferred over deleting if you intend to reuse the VM. diff --git a/website/docs/advanced/docker/images.md b/website/docs/advanced/docker/images.md index 9b19424db..1cda3bbef 100644 --- a/website/docs/advanced/docker/images.md +++ b/website/docs/advanced/docker/images.md @@ -4,8 +4,8 @@ description: Docker Image Dependencies sidebar_position: 1 --- -import useBaseUrl from '@docusaurus/useBaseUrl'; -import ThemedImage from '@theme/ThemedImage'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; :::info This page provides a detailed explanation of Docker @@ -27,13 +27,22 @@ usage and lower workflow execution costs. The figure below illustrates the relationships between the GATK-SV Docker images. - +```mermaid +flowchart TD + ubuntu22[Ubuntu 22.04] --> svbasemini[sv-base-mini] & samtoolsenv[samtools-cloud-virtual-env] & svbaseenv[sv-base-virtual-env] + svbasemini & samtoolsenv & svbaseenv --> svpipelineenv[sv-pipeline-virtual-env] + samtoolsenv --> samtoolscloud[samtools-cloud] & svutilsenv[sv-utils-env] + svbasemini --> samtoolscloud + svutilsenv --> svutils[sv-utils] + samtoolscloud --> svutils & svbase[sv-base] + svpipelineenv & svbase --> svpipeline[sv-pipeline] + svbaseenv --> cnmopsenv[cnmpos-virtual-env] + svbase & cnmopsenv --> cnmpos[cnmops] + + ubuntu18[Ubuntu 18.04] --> manta[Manta] & melt[MELT] & wham[Wham] + samtoolscloud --> wham + ubuntu2210[Ubuntu 22.10] --> str[STR] +``` The image depicts the hierarchical relationship among GATK-SV Docker images. Arrows indicate the flow from a base image @@ -49,6 +58,28 @@ are available in [`dockers.json`](https://github.com/broadinstitute/gatk-sv/blob and [`dockers_azure.json`](https://github.com/broadinstitute/gatk-sv/blob/main/inputs/values/dockers_azure.json) for images hosted on Google Container Registry (GCR) and Azure Container Registry (ACR), respectively. +## Docker Images List {#list} + +The table below lists the GATK-SV Docker images and their dependencies. + +| Image | Code Dependencies | Docker Dependencies | +|------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------| +| `manta` |
  • `dockerfiles/manta/*`
| | +| `melt` |
  • `dockerfiles/melt/*`
|
  • `sv-base`
| + | `wham` |
  • `dockerfiles/wham/*`
|
  • `samtools-cloud`
| + | `str` |
  • `dockerfiles/str/*`
| | + | `sv-base-mini` |
  • `dockerfiles/sv-base-mini/*`
| | + | `samtools-cloud-virtual-env` |
  • `dockerfiles/samtools-cloud-virtual-env/*`
| | + | `samtools-cloud` |
  • `dockerfiles/samtools-cloud/*`
|
  • `sv-base-mini`
  • `samtools-cloud-virtual-env`
| + | `sv-base-virtual-env` |
  • `dockerfiles/sv-base-virtual-env/*`
| | + | `sv-base` |
  • `dockerfiles/sv-base/*`
|
  • `samtools-cloud`
  • `sv-base-virtual-env`
| + | `cnmops-virtual-env` |
  • `dockerfiles/cnmops-virtual-env/*`
|
  • `sv-base-virtual-env`
| + | `cnmops` |
  • `dockerfiles/cnmops/*`
|
  • `sv-base`
  • `cnmops-virtual-env`
| + | `sv-pipeline-virtual-env` |
  • `dockerfiles/sv-pipeline-virtual-env/*`
|
  • `sv-base-mini`
  • `sv-base-virtual-env`
  • `samtools-cloud-virtual-env`
| + | `sv-pipeline` |
  • `dockerfiles/sv-pipeline/*`
  • `src/RdTest/*`
  • `src/sv-pipeline/*`
  • `src/svqc/*`
  • `src/svtest/*`
  • `src/svtk/*`
  • `src/WGD/*`
|
  • `sv-base`
  • `sv-pipeline-virtual-env`
| + | `sv-utils-env` |
  • `dockerfiles/sv-utils-env/*`
|
  • `samtools-cloud-virtual-env`
| + | `sv-utils` |
  • `dockerfiles/sv-utils/*`
  • `src/sv_utils/src/*`
  • `src/sv_utils/setup.py`
|
  • `samtools-cloud`
  • `sv-utils-env`
| + ## Advantages of Dividing Images by Functionality @@ -58,42 +89,32 @@ the pipeline is organized into multiple smaller images, each focusing on a speci This approach offers several benefits. -By splitting the tools into separate Docker images, we achieve a modular -and focused structure. Each image contains the tools required for a specific -task within the GATK-SV pipeline. This enables users and developers to easily -work with individual images, as they can identify the specific tools needed -for their particular analysis. +- **Modular and focused structure:** +Each image includes task-specific tools, simplifying the use and maintenance of +GATK-SV Docker images for users and developers, respectively. -Moreover, using smaller, task-specific Docker images offers the advantage -of reduced sizes, which is particularly beneficial in cloud environments. -These smaller images require less storage space when stored in container -registries like Google Cloud Container Registry (GCR) or Azure Container Registry (ACR). -Additionally, when creating virtual machines for workflow task execution, -the transfer of these smaller images is more efficient. +- **Reduced Docker image size:** +Using task-specific Docker images reduces sizes, requiring less storage space +in container registries. It also enables faster image transfer +when creating virtual machines for task execution. -Separate Docker images enhance maintenance and extensibility -in the GATK-SV pipeline. Maintainers can easily modify or update -specific tools or configurations within a single image without -impacting others. This granularity improves maintainability -and enables seamless expansion of the pipeline by adding or -replacing tools as required. +- **Enhanced maintenance and extensibility:** +Maintainers can easily modify specific tools or configurations within +a single image without affecting others, improving maintainability and +facilitating seamless expansion by adding or replacing tools as required. -Additionally, the Docker image hierarchy offers advantages in terms of -consistency and efficiency. One image can be built upon another, -leveraging existing setups and tools. This promotes code reuse and -reduces duplication, resulting in consistent configurations across -different stages of the pipeline. It also simplifies the management -of common dependencies, as changes or updates can be applied at the -appropriate level, cascading down to the dependent images. +- **Consistency and efficiency:** +Building images on top of existing setups and tools promotes code +reuse and reduces duplication, ensuring consistent configurations +across pipeline stages. It simplifies dependency management by +allowing changes or updates at the appropriate level, cascading +down to dependent images. -In summary, by splitting the tools into smaller, task-specific images, -the pipeline becomes more modular and manageable. -This approach optimizes storage, execution, maintenance, -and extensibility in cloud environments. -Leveraging Docker's image hierarchy further enhances consistency, -code reuse, and dependency management, ensuring efficient and -scalable execution of the pipeline. +In summary, splitting tools into smaller, task-specific +Docker images optimizes storage, execution, maintenance, and extensibility. +It enhances consistency, code reuse, and dependency management, +ensuring efficient and scalable pipeline execution. diff --git a/website/docs/advanced/docker/index.md b/website/docs/advanced/docker/index.md index bd4868f07..6bee59397 100644 --- a/website/docs/advanced/docker/index.md +++ b/website/docs/advanced/docker/index.md @@ -4,9 +4,6 @@ description: Docker Concepts and Execution Overview sidebar_position: 0 --- -import useBaseUrl from '@docusaurus/useBaseUrl'; -import ThemedImage from '@theme/ThemedImage'; - To make the analysis process scalable, reproducible, and cost-efficient, GATK-SV is designed as a cloud-native pipeline, meaning it runs on virtual machines (VMs) hosted in the cloud. @@ -34,13 +31,32 @@ The following figure is a high-level illustration depicting the relationship between Dockerfiles, Docker images, Docker containers, and Cloud VMs. - +```mermaid + +flowchart LR + dockerfile[Dockerfile] -- Build --> acr_image[Docker Image] & gcp_image[Docker Image] + + subgraph Microsoft Azure + subgraph ACR + acr_image + end + + subgraph Azure VM + acr_image -- Run --> az_container[Container] + end + end + + subgraph Google Cloud Platform + subgraph GCR + gcp_image + end + + subgraph GCP VM + gcp_image -- Run --> gcp_container[Container] + end + end + +``` The GATK-SV Docker setup is organized as follows: diff --git a/website/docusaurus.config.js b/website/docusaurus.config.js index 90582d420..48f4674d5 100644 --- a/website/docusaurus.config.js +++ b/website/docusaurus.config.js @@ -98,10 +98,6 @@ const config = { label: 'Github', href: 'https://github.com/broadinstitute/gatk-sv/discussions', }, - { - label: 'Twitter', - href: 'https://twitter.com/broadinstitute', - }, ], }, { @@ -126,7 +122,17 @@ const config = { theme: lightCodeTheme, darkTheme: darkCodeTheme, }, + docs: { + sidebar: { + hideable: true, + } + } }), + + themes: ['@docusaurus/theme-mermaid'], + markdown: { + mermaid: true, + } }; module.exports = config; diff --git a/website/package.json b/website/package.json index 532ebad16..0e74541bf 100644 --- a/website/package.json +++ b/website/package.json @@ -14,8 +14,9 @@ "write-heading-ids": "docusaurus write-heading-ids" }, "dependencies": { - "@docusaurus/core": "2.2.0", - "@docusaurus/preset-classic": "2.2.0", + "@docusaurus/core": "2.4.1", + "@docusaurus/preset-classic": "2.4.1", + "@docusaurus/theme-mermaid": "^2.4.1", "@mdx-js/react": "^1.6.22", "clsx": "^1.2.1", "prism-react-renderer": "^1.3.5", diff --git a/website/static/img/docker_hierarchy.png b/website/static/img/docker_hierarchy.png deleted file mode 100644 index 5ef7f98e2..000000000 --- a/website/static/img/docker_hierarchy.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:46143ad234a9932e6d7e9e3690a527309c7ac01e72e76920575e2b6c466469e3 -size 838559 diff --git a/website/static/img/docker_infra_diagram.png b/website/static/img/docker_infra_diagram.png deleted file mode 100644 index 905709103..000000000 --- a/website/static/img/docker_infra_diagram.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:917ccbfe2fc97a5d8adffc52ecdf77abd666b3273b28ab248bd23b117ef76ca6 -size 1126378