diff --git a/website/docs/advanced/docker/deploy/incremental.md b/website/docs/advanced/docker/deploy/incremental.md
index 46ed7b3cb..95898bc0c 100644
--- a/website/docs/advanced/docker/deploy/incremental.md
+++ b/website/docs/advanced/docker/deploy/incremental.md
@@ -51,10 +51,30 @@ the commit SHAs are determined automatically.
In CI/CD, the commit SHAs are determined as the following example.
-```
- X---Y---Z feature branch
- / \
-A---B---C---D---E main branch
+```mermaid
+%%{init: {
+ 'logLevel': 'debug',
+ 'gitGraph': {'rotateCommitLabel': false},
+ 'themeVariables': { 'commitLabelFontSize': '22px' }
+ }
+ }%%
+gitGraph
+ commit id: "A"
+ commit id: "B"
+ branch feature
+ checkout feature
+ commit id: "X"
+ checkout main
+ commit id: "C"
+ checkout feature
+ commit id: "Y"
+ checkout main
+ commit id: "D"
+ checkout feature
+ commit id: "Z"
+ checkout main
+ merge feature id: "E"
+ commit id: "F"
```
In this example, `BASE_SHA=B`, `HEAD_SHA=Z`, and `E` is the merge commit.
@@ -62,16 +82,19 @@ In this example, `BASE_SHA=B`, `HEAD_SHA=Z`, and `E` is the merge commit.
## Identifying Images Requiring Rebuilding from Changed Files
-The build_docker script identifies the list of docker images
+The `build_docker` script identifies the list of docker images
that need to be rebuilt based on two factors.
-Firstly, directly impacted images are determined by examining the
+
+1. Directly impacted images are determined by checking the
list of files each image depends on. If any of these files have
-been changed, the corresponding image requires rebuilding.
-Secondly, indirectly impacted images are determined based on
-the hierarchical dependency between images. If an image is
-built upon another image, and the base image is being rebuilt,
-then the dependent image also needs to be rebuilt. This two-step
-process ensures that all the affected images are correctly
+changed, the corresponding image needs rebuilding.
+
+2. Indirectly impacted images are identified based on
+the hierarchical dependency between images.
+If a base image is rebuilt, any dependent images built upon
+it also require rebuilding.
+
+This two-step process ensures that all the affected images are correctly
identified for rebuilding.
diff --git a/website/docs/advanced/docker/deploy/manual.md b/website/docs/advanced/docker/deploy/manual.md
index 9cf2c899c..896985e44 100644
--- a/website/docs/advanced/docker/deploy/manual.md
+++ b/website/docs/advanced/docker/deploy/manual.md
@@ -4,11 +4,310 @@ description: Build and Publish Images
sidebar_position: 3
---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
If you are contributing to the GATK-SV codebase, specifically focusing on
enhancing tools, configuring dependencies in Dockerfiles, or modifying GATK-SV scripts
within the Docker images, it is important to build and test the Docker images locally.
-This ensures that the images are successfully built and function as intended.
-Additionally, if you wish to host the images in your own container registry,
-you will need to follow these steps.
-To simplify the build process, we have developed a Python script
-that automates the image building, and publishing to your container registry.
+This ensures that the images are successfully built and function as intended.
+
+The process of updating GATK-SV Docker images involves two steps: build and publish.
+
+- **Build**: Create Docker images from Dockerfiles and store them on your computer.
+
+- **Publish**: Upload the built Docker images to container registries
+(e.g., Google Container registry, or Azure container registry)
+to make them available for use in Terra or Cromwell.
+
+You may refer to [this page](/docs/advanced/docker/index.md) for detailed description of the process.
+To streamline the process, we have developed a Python script
+that automates the image building and publishing to your container registry.
+This section provides guidelines on building and publishing the images using this script.
+
+
+:::warning Linux Machine Required
+
+Only Linux machines (dedicated or virtual) are supported for building GATK-SV Docker images.
+Images created on non-Linux machines may not work with Terra or Cromwell execution environment.
+The instructions provided on this page assume you are using a Linux Ubuntu machine.
+:::
+
+
+
+## Setup
+
+### Runtime environment {#runtime}
+
+Currently, GATK-SV Docker images can only be built on the `linux/amd64` platform,
+which is a machine running Linux OS on x86-64 architecture.
+Images build on Apple M1 (`linux/arm64`) are not currently supported.
+You can use a local Linux machine or obtain a virtual machine from a cloud platform.
+
+You may follow the steps in the
+[GCP](https://cloud.google.com/compute/docs/instances/create-start-instance#publicimage)
+or [Azure](https://learn.microsoft.com/en-us/azure/virtual-machines/windows/quick-create-portal)
+documentation to create a virtual machine (VM) on Google Cloud Platform (GCP) or Microsoft Azure respectively.
+Make sure the VM is built using an Ubuntu image, has at least 8 GB RAM, and some additional
+disk space (e.g., 50 GB should be sufficient).
+
+Building and publishing GATK-SV Docker images is time-consuming and can take around 1 hour.
+Therefore, we recommend using a terminal multiplexer
+(e.g., [tmux](https://github.com/tmux/tmux/wiki/Getting-Started);
+[tmux cheat sheet](https://tmuxcheatsheet.com))
+when running on a VM to ensure the process continues even if you are disconnected from the VM.
+
+### Docker {#docker}
+
+[Install](https://docs.docker.com/engine/install/) Docker desktop
+and login using `sudo docker login`. If utilizing GATK-SV Docker images
+from a private container registry or intending to publish the resulting
+images to a registry, ensure that you are logged in with credentials
+that grant you access to the registry.
+
+
+
+
+ You may follow
+ [this documentation](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-authentication?tabs=azure-cli)
+ on setting up Docker authentication to an Azure container registry.
+
+
+
+ You may follow
+ [this documentation](https://cloud.google.com/artifact-registry/docs/docker/authentication)
+ on setting up Docker authentication to a Google container registry.
+
+
+
+
+### Checkout codebase {#checkout}
+
+Make sure you are on the `git` branch with the code you want to add
+to the GATK-SV Docker images you are building.
+
+```shell
+git fetch origin
+git checkout origin/
+```
+
+## Build and Publish Docker Images {#build}
+
+All the GATK-SV Dockerfiles are hosted under the directory
+[`gatk-sv/dockerfiles/`](https://github.com/broadinstitute/gatk-sv/tree/main/dockerfiles).
+While you can build the GATK-SV Docker images by following the standard
+[Docker image build procedures](https://docs.docker.com/engine/reference/commandline/image_build/),
+that can be challenging due to the nested hierarchy of GATK-SV Docker images.
+To simplify the process, we have developed a utility script that streamlines the
+Docker image build process
+([`scripts/docker/build_docker.py`](https://github.com/broadinstitute/gatk-sv/blob/main/scripts/docker/build_docker.py)).
+
+In the following, we will explain how to use the utility script for a simple use-case.
+For more advanced and additional functionalities, please refer to the script's documentation,
+which you may access it as the following.
+
+```shell
+python scripts/docker/build_docker.py --help
+```
+
+
+In its basic setup, you can use the following command to **build and publish** a GATK-SV Docker image.
+
+```shell
+python scripts/docker/build_docker.py \
+ --targets \
+ --image-tag \
+ --docker-repo
+```
+
+The arguments used are explained in the following.
+
+### Determine which images need to be rebuilt {#targets}
+
+You may follow either of the following practices to determine which images to rebuild.
+
+- **Automatic:**
+ The script can automatically determine which Docker images need a rebuild
+ based on a list of changed files and cross-referencing them with the
+ table in [this section](/docs/advanced/docker/images#list).
+ Specifically, it takes two git commit SHAs as input, uses `git diff`
+ to extract the list of changed files, and then cross-referencing them
+ with [this table](/docs/advanced/docker/images#list) to identify the Docker
+ images requiring rebuilding. Details can be found on [this page](/docs/advanced/docker/deploy/incremental.md).
+ To use this feature, commit the changes first, identify `BASE_SHA` and `HEAD_SHA` using `git log` or GitHub
+ (details on [this page](/docs/advanced/docker/deploy/incremental.md)),
+ and then call the script as follows.
+
+ ```shell
+ python scripts/docker/build_docker.py \
+ --base-git-commit BASE_SHA \
+ --current-git-commit HEAD_SHA
+ ```
+
+- **Manual: **
+ You may refer to the table in [this section](/docs/advanced/docker/images#list)
+ to determine which Docker images to rebuild based on the changed files.
+ For instance, if you modified any of the files under the
+ [`gatk-sv/src/svtk/`](https://github.com/broadinstitute/gatk-sv/tree/main/src/svtk)
+ directory, you will need to rebuild the `sv-pipeline` Docker image.
+ You can set the list of images to rebuild using the `--targets` argument.
+ For instance:
+
+ ```shell
+ python scripts/docker/build_docker.py \
+ --targets sv-pipeline
+ ```
+
+ You may specify multiple images to rebuild by providing a list of their names.
+ For instance, the following command builds the `sv-pipeline` and the `str` Docker images.
+
+ ```shell
+ python scripts/docker/build_docker.py \
+ --targets sv-pipeline str
+ ```
+
+Please note that `--targets` and `--base-git-commit --current-git-commit`
+options are mutually exclusive. In other words, you can either manually specify
+images to rebuild, or let the script determine them.
+Combining or avoiding both options is not currently supported.
+
+:::info
+Following the steps above, the script builds the specified Docker images
+_and all the images derived from them_, ensuring proper propagation of changes through the pipeline.
+If you want to build only the specified images, you would need to add the `--skip-dependent-images` flag.
+:::
+
+
+### Image tag {#tag}
+
+[Docker image tags](https://docs.docker.com/engine/reference/commandline/tag/)
+are used to distinguish between different builds of the same image.
+You can use any naming convention for your tags.
+GATK-SV docker images use the following template for tags,
+which you may want to adopt, in particular, if you plan to publish
+your images on the GATK-SV container registries.
+
+```
+[Date]-[Release Tag]-[Head SHA 8]
+```
+
+where `[Date]` is `YYYY-MM-DD` extracted from the time stamp of the last
+commit on the feature branch, `[Release Tag]` is extracted from the latest [pre-]release on GitHub,
+and the `[Head SHA 8]` is the first eight letters of the SHA of the
+last commit on the feature branch.
+
+For example:
+
+```
+2023-07-28-v0.28.1-beta-e70dfbd7
+```
+
+For automatically composing image tags, you may follow the practices
+used in [GATK-SV CI/CD](https://github.com/broadinstitute/gatk-sv/blob/286a87f3bcfc0b8c811ff789776dd0b135f582e9/.github/workflows/sv_pipeline_docker.yml#L85-L109).
+
+
+
+### Specify the container registry {#registry}
+The built images are stored on your computer. If you are only developing
+or testing locally, there is no need to push them to a container registry.
+In this case you can avoid providing `--docker-repo `.
+
+You need to push the images to a container registry if you want to:
+
+- Use the updated Docker images for WDL testing or development;
+- Store them on a container registry other than those maintained by the GATK-SV team.
+
+The script automatically pushes Docker images to a container registry.
+To use this feature, you may follow these steps:
+
+1. Ensure you are logged into Docker with credentials granting
+push access to the container registry. Please refer to the
+[Docker](#docker) section for details.
+
+
+2. Provide the `--docker-repo ` argument,
+replacing `` with the name of your container registry.
+For Google Container Registry (GCR) and Azure Container Registry (ACR),
+the format is generally as follows.
+
+
+
+
+ Template:
+
+ ```shell
+ .azurecr.io//:
+ ```
+
+ Example:
+ ```shell
+ python scripts/docker/build_docker.py \
+ --targets sv-pipeline
+ --tag v1
+ --docker-repo myregistry.azurecr.io/gatk-sv
+ ```
+
+ which results in creating the following image:
+
+ ```shell
+ myregistry.azurecr.io/gatk-sv/sv-pipeline:v1
+ ```
+
+
+
+
+ Template:
+
+ ```shell
+ //:
+ ```
+
+ Example:
+ ```shell
+ python scripts/docker/build_docker.py \
+ --targets sv-pipeline
+ --tag v1
+ --docker-repo us.gcr.io/my-repository/gatk-sv
+ ```
+
+ which results in creating the following image:
+
+ ```shell
+ us.gcr.io/my-repository/gatk-sv/sv-pipeline:v1
+ ```
+
+
+
+
+Please note that we are currently using GCR, but it has been migrated to Google Artifact Registry.
+
+
+
+## Post-build
+
+- GATK-SV docker images are mainly intended for use in WDLs.
+ Therefore, it's a good practice to test the newly updated
+ images in related WDLs. This ensures that the updated images function
+ as expected within specific workflows.
+
+- If you were using a Linux VM to build the Docker images,
+ ensure you either stop or delete the VM after building the images.
+ Stopping the VM won't delete the disk, and you'll continue to
+ incur disk usage charges. If you don't want to incur disk costs,
+ you can delete the VM along with all its associated resources.
+ Stopping is preferred over deleting if you intend to reuse the VM.
diff --git a/website/docs/advanced/docker/images.md b/website/docs/advanced/docker/images.md
index 9b19424db..1cda3bbef 100644
--- a/website/docs/advanced/docker/images.md
+++ b/website/docs/advanced/docker/images.md
@@ -4,8 +4,8 @@ description: Docker Image Dependencies
sidebar_position: 1
---
-import useBaseUrl from '@docusaurus/useBaseUrl';
-import ThemedImage from '@theme/ThemedImage';
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
:::info
This page provides a detailed explanation of Docker
@@ -27,13 +27,22 @@ usage and lower workflow execution costs.
The figure below illustrates the relationships between the GATK-SV Docker images.
-
+```mermaid
+flowchart TD
+ ubuntu22[Ubuntu 22.04] --> svbasemini[sv-base-mini] & samtoolsenv[samtools-cloud-virtual-env] & svbaseenv[sv-base-virtual-env]
+ svbasemini & samtoolsenv & svbaseenv --> svpipelineenv[sv-pipeline-virtual-env]
+ samtoolsenv --> samtoolscloud[samtools-cloud] & svutilsenv[sv-utils-env]
+ svbasemini --> samtoolscloud
+ svutilsenv --> svutils[sv-utils]
+ samtoolscloud --> svutils & svbase[sv-base]
+ svpipelineenv & svbase --> svpipeline[sv-pipeline]
+ svbaseenv --> cnmopsenv[cnmpos-virtual-env]
+ svbase & cnmopsenv --> cnmpos[cnmops]
+
+ ubuntu18[Ubuntu 18.04] --> manta[Manta] & melt[MELT] & wham[Wham]
+ samtoolscloud --> wham
+ ubuntu2210[Ubuntu 22.10] --> str[STR]
+```
The image depicts the hierarchical relationship among GATK-SV
Docker images. Arrows indicate the flow from a base image
@@ -49,6 +58,28 @@ are available in [`dockers.json`](https://github.com/broadinstitute/gatk-sv/blob
and [`dockers_azure.json`](https://github.com/broadinstitute/gatk-sv/blob/main/inputs/values/dockers_azure.json)
for images hosted on Google Container Registry (GCR) and Azure Container Registry (ACR), respectively.
+## Docker Images List {#list}
+
+The table below lists the GATK-SV Docker images and their dependencies.
+
+| Image | Code Dependencies | Docker Dependencies |
+|------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|
+| `manta` | | |
+| `melt` | | |
+ | `wham` | | |
+ | `str` | | |
+ | `sv-base-mini` | - `dockerfiles/sv-base-mini/*`
| |
+ | `samtools-cloud-virtual-env` | - `dockerfiles/samtools-cloud-virtual-env/*`
| |
+ | `samtools-cloud` | - `dockerfiles/samtools-cloud/*`
| - `sv-base-mini`
- `samtools-cloud-virtual-env`
|
+ | `sv-base-virtual-env` | - `dockerfiles/sv-base-virtual-env/*`
| |
+ | `sv-base` | | - `samtools-cloud`
- `sv-base-virtual-env`
|
+ | `cnmops-virtual-env` | - `dockerfiles/cnmops-virtual-env/*`
| |
+ | `cnmops` | | - `sv-base`
- `cnmops-virtual-env`
|
+ | `sv-pipeline-virtual-env` | - `dockerfiles/sv-pipeline-virtual-env/*`
| - `sv-base-mini`
- `sv-base-virtual-env`
- `samtools-cloud-virtual-env`
|
+ | `sv-pipeline` | - `dockerfiles/sv-pipeline/*`
- `src/RdTest/*`
- `src/sv-pipeline/*`
- `src/svqc/*`
- `src/svtest/*`
- `src/svtk/*`
- `src/WGD/*`
| - `sv-base`
- `sv-pipeline-virtual-env`
|
+ | `sv-utils-env` | - `dockerfiles/sv-utils-env/*`
| - `samtools-cloud-virtual-env`
|
+ | `sv-utils` | - `dockerfiles/sv-utils/*`
- `src/sv_utils/src/*`
- `src/sv_utils/setup.py`
| - `samtools-cloud`
- `sv-utils-env`
|
+
## Advantages of Dividing Images by Functionality
@@ -58,42 +89,32 @@ the pipeline is organized into multiple smaller images, each focusing on a speci
This approach offers several benefits.
-By splitting the tools into separate Docker images, we achieve a modular
-and focused structure. Each image contains the tools required for a specific
-task within the GATK-SV pipeline. This enables users and developers to easily
-work with individual images, as they can identify the specific tools needed
-for their particular analysis.
+- **Modular and focused structure:**
+Each image includes task-specific tools, simplifying the use and maintenance of
+GATK-SV Docker images for users and developers, respectively.
-Moreover, using smaller, task-specific Docker images offers the advantage
-of reduced sizes, which is particularly beneficial in cloud environments.
-These smaller images require less storage space when stored in container
-registries like Google Cloud Container Registry (GCR) or Azure Container Registry (ACR).
-Additionally, when creating virtual machines for workflow task execution,
-the transfer of these smaller images is more efficient.
+- **Reduced Docker image size:**
+Using task-specific Docker images reduces sizes, requiring less storage space
+in container registries. It also enables faster image transfer
+when creating virtual machines for task execution.
-Separate Docker images enhance maintenance and extensibility
-in the GATK-SV pipeline. Maintainers can easily modify or update
-specific tools or configurations within a single image without
-impacting others. This granularity improves maintainability
-and enables seamless expansion of the pipeline by adding or
-replacing tools as required.
+- **Enhanced maintenance and extensibility:**
+Maintainers can easily modify specific tools or configurations within
+a single image without affecting others, improving maintainability and
+facilitating seamless expansion by adding or replacing tools as required.
-Additionally, the Docker image hierarchy offers advantages in terms of
-consistency and efficiency. One image can be built upon another,
-leveraging existing setups and tools. This promotes code reuse and
-reduces duplication, resulting in consistent configurations across
-different stages of the pipeline. It also simplifies the management
-of common dependencies, as changes or updates can be applied at the
-appropriate level, cascading down to the dependent images.
+- **Consistency and efficiency:**
+Building images on top of existing setups and tools promotes code
+reuse and reduces duplication, ensuring consistent configurations
+across pipeline stages. It simplifies dependency management by
+allowing changes or updates at the appropriate level, cascading
+down to dependent images.
-In summary, by splitting the tools into smaller, task-specific images,
-the pipeline becomes more modular and manageable.
-This approach optimizes storage, execution, maintenance,
-and extensibility in cloud environments.
-Leveraging Docker's image hierarchy further enhances consistency,
-code reuse, and dependency management, ensuring efficient and
-scalable execution of the pipeline.
+In summary, splitting tools into smaller, task-specific
+Docker images optimizes storage, execution, maintenance, and extensibility.
+It enhances consistency, code reuse, and dependency management,
+ensuring efficient and scalable pipeline execution.
diff --git a/website/docs/advanced/docker/index.md b/website/docs/advanced/docker/index.md
index bd4868f07..6bee59397 100644
--- a/website/docs/advanced/docker/index.md
+++ b/website/docs/advanced/docker/index.md
@@ -4,9 +4,6 @@ description: Docker Concepts and Execution Overview
sidebar_position: 0
---
-import useBaseUrl from '@docusaurus/useBaseUrl';
-import ThemedImage from '@theme/ThemedImage';
-
To make the analysis process scalable, reproducible, and cost-efficient,
GATK-SV is designed as a cloud-native pipeline,
meaning it runs on virtual machines (VMs) hosted in the cloud.
@@ -34,13 +31,32 @@ The following figure is a high-level illustration depicting the relationship
between Dockerfiles, Docker images, Docker containers, and Cloud VMs.
-
+```mermaid
+
+flowchart LR
+ dockerfile[Dockerfile] -- Build --> acr_image[Docker Image] & gcp_image[Docker Image]
+
+ subgraph Microsoft Azure
+ subgraph ACR
+ acr_image
+ end
+
+ subgraph Azure VM
+ acr_image -- Run --> az_container[Container]
+ end
+ end
+
+ subgraph Google Cloud Platform
+ subgraph GCR
+ gcp_image
+ end
+
+ subgraph GCP VM
+ gcp_image -- Run --> gcp_container[Container]
+ end
+ end
+
+```
The GATK-SV Docker setup is organized as follows:
diff --git a/website/docusaurus.config.js b/website/docusaurus.config.js
index 90582d420..48f4674d5 100644
--- a/website/docusaurus.config.js
+++ b/website/docusaurus.config.js
@@ -98,10 +98,6 @@ const config = {
label: 'Github',
href: 'https://github.com/broadinstitute/gatk-sv/discussions',
},
- {
- label: 'Twitter',
- href: 'https://twitter.com/broadinstitute',
- },
],
},
{
@@ -126,7 +122,17 @@ const config = {
theme: lightCodeTheme,
darkTheme: darkCodeTheme,
},
+ docs: {
+ sidebar: {
+ hideable: true,
+ }
+ }
}),
+
+ themes: ['@docusaurus/theme-mermaid'],
+ markdown: {
+ mermaid: true,
+ }
};
module.exports = config;
diff --git a/website/package.json b/website/package.json
index 532ebad16..0e74541bf 100644
--- a/website/package.json
+++ b/website/package.json
@@ -14,8 +14,9 @@
"write-heading-ids": "docusaurus write-heading-ids"
},
"dependencies": {
- "@docusaurus/core": "2.2.0",
- "@docusaurus/preset-classic": "2.2.0",
+ "@docusaurus/core": "2.4.1",
+ "@docusaurus/preset-classic": "2.4.1",
+ "@docusaurus/theme-mermaid": "^2.4.1",
"@mdx-js/react": "^1.6.22",
"clsx": "^1.2.1",
"prism-react-renderer": "^1.3.5",
diff --git a/website/static/img/docker_hierarchy.png b/website/static/img/docker_hierarchy.png
deleted file mode 100644
index 5ef7f98e2..000000000
--- a/website/static/img/docker_hierarchy.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:46143ad234a9932e6d7e9e3690a527309c7ac01e72e76920575e2b6c466469e3
-size 838559
diff --git a/website/static/img/docker_infra_diagram.png b/website/static/img/docker_infra_diagram.png
deleted file mode 100644
index 905709103..000000000
--- a/website/static/img/docker_infra_diagram.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:917ccbfe2fc97a5d8adffc52ecdf77abd666b3273b28ab248bd23b117ef76ca6
-size 1126378