Skip to content

Latest commit

 

History

History
186 lines (139 loc) · 7.69 KB

sidecar-design.md

File metadata and controls

186 lines (139 loc) · 7.69 KB

Datamon sidecar design

This document describes the current design of datamon sidecars.

Design

The current design favors the sidecar container approach, with bespoke signaling between containers, over the CSI driver approach. Therefore, datamon is not available as a kubernetes persistent volume plugin.

This design choice stems from the current inability on GKE to handle Kubernetes ephemeral volumes (v1.16 feature). Managing many short lived kubernetes volumes is at the moment not practical. When GKE eventually makes Kubernetes v1.16 available, we may revive our attempt to make a CSI driver for datamon.

The sidecar approach requires a coordination between containers. Signaling is implemented with files on a shared volume.

Sidecar signaling

We need some coordination between the different containers running in the pod.

In particular, we need to keep the pod running and the sidecars to finish uploading results when the main ARGO workflow is done processing.

Ensuring that data is ready for access (sidecar to main-container messaging) as well as notification that the data-science program has produced output data to upload (main-container to sidecar messaging), is the responsibility of a few shell scripts shipped as part and parcel of the Docker images that practicably constitute sidecars.

The coordination signaling defines the following protocol:

File system mount (fuse)

main(wrap_application.sh) sidecar (wrap_datamon.sh) what happens
<= mountdone application waits for input bundles to be mounted
(do some work...)
initupload => datamon starts running the upload commands
<= uploaddone application waits until its output is archived

Postgres SQL

A similar process is iterated through all configured datamon-postgres sidecars.

main(wrap_application.sh) sidecar (wrap_datamon_pg.sh) what happens
<= dbstarted application waits for the DB instance to be ready
(do some work...)
initdbupload => datamon archives the database as a bundle
<= dbuploaddone application waits until its output is archived

Sidecar usage

Users need only place the wrap_application.sh script located in the root directory of the main container. This can be accomplished via an initContainer without duplicating version of the Datamon sidecar image in both the main application Dockerfile as well as the YAML. When using a block-storage GCS product, we might've specified a data-science application's Argo DAG node with something like

command: ["app"]
args: ["param1", "param2"]

whereas with wrap_application.sh in place, this would be something to the effect of

command: ["/path/to/wrap_application.sh"]
args: ["-c", "/path/to/coordination_directory", "-b", "fuse", "--", "app", "param1", "param2"]

That is, wrap_application.sh has the following usage

wrap_application.sh -c <coordination_directory> -b <sidecar_kind> -- <application_command>

where

  • <coordination_directory> is an empty directory in a shared volume (an emptyDir using memory-backed storage suffices). each coordination directory (not necessarily the volume) corresponds to a particular DAG node (i.e. Kubernetes pod) and vice-versa.
  • <sidecar_kind> is in correspondence with the containers specified in the YAML and may be among
    • fuse
    • postgres
  • <application_command> is the data-science application command exactly as it would appear without the wrapper script. That is, the wrapper script, relies the conventional UNIX syntax for stating that options to a command are done being declared.

Meanwhile, each sidecar's datamon-specific batteries have their corresponding usages.

gcr.io/onec-co/datamon-fuse-sidecar -- wrap_datamon.sh

Provides filesystem representations (i.e. a folder) of datamon bundles. Since bundles' filelists are serialized filesystem representations, the wrap_datamon.sh interface is tightly coupled to that of the self-documenting datamon binary itself.

./wrap_datamon.sh -c <coord_dir> -d <bin_cmd_I> -d <bin_cmd_J> ...
  • -c the same coordination directory passed to wrap_application.sh
  • -d all parameters, exactly as passed to the datamon binary, except as a single scalar (quoted) parameter, for one of the following commands
    • config sets user information associated with any bundles created by the node
    • bundle mount provides sources for data-science applications
    • bundle upload provides sinks for data-science applications

Multiple (or none) bundle mount and bundle upload commands may be specified, and at most one config command is allowed so that an example wrap_datamon.sh YAML might be

command: ["./wrap_datamon.sh"]
args: ["-c", "/tmp/coord", "-d", "config create", "-d", "bundle upload --path /tmp/upload --message \"result of container coordination demo\" --repo ransom-datamon-test-repo --label coordemo", "-d", "bundle mount --repo ransom-datamon-test-repo --label testlabel --mount /tmp/mount --stream"]

or from the shell

./wrap_datamon.sh -c /tmp/coord -d 'config create' -d 'bundle upload --path /tmp/upload --message "result of container coordination demo" --repo ransom-datamon-test-repo --label coordemo' -d 'bundle mount --repo ransom-datamon-test-repo --label testlabel --mount /tmp/mount --stream'

Aside on serialization format

Each of these environment variables each contain a serialized dictionary according the the following format

<entry_sperator><key_value_seperator><entry_1><entry_seperator><entry_2>...

where <entry_sperator> and <key_value_seperator> are each a single character, anything other than a ., and each <entry> is of one of two forms, either <option> or <option><key_value_seperator><arg>.

So for example

;:a;b:c

expresses something like a Python map

{'a': True, 'b' : 'c'}

or shell option args

<argv0> -a -b c