Skip to content

Commit

Permalink
Feature/#30 build the data science sandbox as docker image (#40)
Browse files Browse the repository at this point in the history
* First implementation of Docker creation
* Added cli.
* Remove containers and image after running integration test.
* Changed order of tests to run fast unit tests before
* Apply suggestions from code review
* Removed (scope="session") from @pytest.fixture for test Docker containers

---------

Co-authored-by: Christoph Pirkl <[email protected]>
  • Loading branch information
ckunki and kaklakariada authored Nov 13, 2023
1 parent 58d5c86 commit f83662f
Show file tree
Hide file tree
Showing 27 changed files with 397 additions and 156 deletions.
11 changes: 7 additions & 4 deletions .github/workflows/check_ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,11 @@ jobs:
uses: ./.github/actions/prepare_poetry_env

- name: Run pytest
run: poetry run pytest test/ci/test_install_dependencies.py
run: >
poetry run pytest
test/unit
test/integration/test_create_dss_docker_image.py
env: # Set the secret as an env variable
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_ACCESS_KEY_SECRET }}
AWS_DEFAULT_REGION: ${{ secrets.AWS_REGION }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_ACCESS_KEY_SECRET }}
AWS_DEFAULT_REGION: ${{ secrets.AWS_REGION }}
1 change: 1 addition & 0 deletions doc/changes/changes_0.1.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Version: 0.1.0

- #11: Created a notebook to show training with scikit-learn in the notebook
- #15: Installed exasol-notebook-connector via ansible
- #30: Added script to build the Data Science Sandbox as Docker Image

## Bug Fixes

Expand Down
11 changes: 6 additions & 5 deletions doc/developer_guide/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,11 @@ A CLI command has normally a respective function in the `lib` submodule. Hence,

There are generally three types of commands:

| Type | Explanation |
| ----- | --------- |
| Release Commands | used during the release |
| Deployment Commands | used to deploy infrastructure onto AWS cloud |
| Development Commands | used to identify problems or for testing |
| Type | Explanation |
|----------------------|----------------------------------------------|
| Release Commands | used during the release |
| Deployment Commands | used to deploy infrastructure onto AWS cloud |
| Development Commands | used to identify problems or for testing |

### Release commands

Expand Down Expand Up @@ -71,6 +71,7 @@ The following commands can be used to deploy the infrastructure onto a given AWS
- `setup-vm-bucket` - deploys the AWS Bucket cloudformation stack which will be used to deploy the VM images
- `setup-release-codebuild` - deploys the AWS Codebuild cloudformation stack which will be used for the release-build
- `setup-vm-bucket-waf` - deploys the AWS Codebuild cloudformation stack which contains the WAF Acl configuration for the Cloudfront distribution of the VM Bucket
- `create-docker-image` - creates a Docker image for data-science-sandbox and deploys it to hub.docker.com/exasol/data-science-sandbox

## Flow

Expand Down
1 change: 1 addition & 0 deletions exasol/ds/sandbox/cli/commands/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@
from .update_release import update_release
from .make_ami_public import make_ami_public
from .setup_vm_bucket_waf import setup_vm_bucket_waf
from .create_docker_image import create_docker_image
44 changes: 44 additions & 0 deletions exasol/ds/sandbox/cli/commands/create_docker_image.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
import click

from exasol.ds.sandbox.cli.cli import cli
from exasol.ds.sandbox.cli.options.logging import logging_options
from exasol.ds.sandbox.cli.common import add_options
from exasol.ds.sandbox.lib.dss_docker import DssDockerImage
from exasol.ds.sandbox.lib.logging import SUPPORTED_LOG_LEVELS
from exasol.ds.sandbox.lib.logging import set_log_level


@cli.command()
@add_options([
click.option(
'--repository', type=str, metavar="ORG/REPO", show_default=True,
default="exasol/data-science-sandbox",
help="Organization and repository on hub.docker.com to publish the docker image to"),
click.option('--version', type=str, help="Docker image version tag"),
click.option(
'--publish', type=bool, is_flag=True,
help="Whether to publish the created Docker image"),
click.option(
'--keep-container', type=bool, is_flag=True,
help="""Keep the Docker Container running after creating the image.
Otherwise stop and remove the container."""),
])
@add_options(logging_options)
def create_docker_image(
repository: str,
version: str,
publish: bool,
keep_container: bool,
log_level: str,
):
"""
Create a Docker image for data-science-sandbox and deploy
it to a Docker repository.
"""
set_log_level(log_level)
DssDockerImage(
repository=repository,
version=version,
publish=publish,
keep_container=keep_container,
).create()
9 changes: 6 additions & 3 deletions exasol/ds/sandbox/lib/ansible/ansible_access.py
Original file line number Diff line number Diff line change
@@ -1,24 +1,27 @@
from typing import Callable

import ansible_runner
import logging

from typing import Callable

from exasol.ds.sandbox.lib.ansible.ansible_run_context import AnsibleRunContext
from exasol.ds.sandbox.lib.logging import get_status_logger, LogType


class AnsibleException(RuntimeError):
pass


class AnsibleAccess:

"""
Provides access to ansible runner.
@raises: AnsibleException if ansible execution fails
"""
@staticmethod
def run(private_data_dir: str, run_ctx: AnsibleRunContext, printer: Callable[[str], None]):
quiet = not get_status_logger(LogType.ANSIBLE).isEnabledFor(logging.INFO)
r = ansible_runner.run(private_data_dir=private_data_dir,
playbook=run_ctx.playbook,
quiet=quiet,
extravars=run_ctx.extra_vars)
for e in r.events:
printer(e)
Expand Down
2 changes: 1 addition & 1 deletion exasol/ds/sandbox/lib/ansible/ansible_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ def __init__(self, ansible_access: AnsibleAccess, work_dir: Path):

@staticmethod
def printer(msg: str):
LOG.info(msg)
LOG.debug(msg)

def run(self, ansible_run_context: AnsibleRunContext, host_infos: Tuple[HostInfo]):
inventory_content = render_template("inventory.jinja", host_infos=host_infos)
Expand Down
4 changes: 4 additions & 0 deletions exasol/ds/sandbox/lib/dss_docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
FROM ubuntu:20.04
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install --no-install-recommends --assume-yes python3 python3-pexpect
EXPOSE 8888/tcp
1 change: 1 addition & 0 deletions exasol/ds/sandbox/lib/dss_docker/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .create_image import DssDockerImage
103 changes: 103 additions & 0 deletions exasol/ds/sandbox/lib/dss_docker/create_image.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
import docker
import humanfriendly
import importlib_resources

from datetime import datetime
from docker.types import Mount
from exasol.ds.sandbox.lib import pretty_print
from importlib_metadata import version
from pathlib import Path

from exasol.ds.sandbox.lib.config import ConfigObject, SLC_VERSION
from exasol.ds.sandbox.lib.logging import get_status_logger, LogType
from exasol.ds.sandbox.lib.ansible import ansible_repository
from exasol.ds.sandbox.lib.ansible.ansible_run_context import AnsibleRunContext
from exasol.ds.sandbox.lib.ansible.ansible_access import AnsibleAccess
from exasol.ds.sandbox.lib.setup_ec2.run_install_dependencies import run_install_dependencies


DSS_VERSION = version("exasol-data-science-sandbox")
_logger = get_status_logger(LogType.DOCKER_IMAGE)


class DssDockerImage:
@classmethod
def timestamp(cls) -> str:
return f'{datetime.now().timestamp():.0f}'

def __init__(
self,
repository: str,
version: str = None,
publish: bool = False,
keep_container: bool = False,
):
version = version if version else DSS_VERSION
self.container_name = f"ds-sandbox-{DssDockerImage.timestamp()}"
self.image_name = f"{repository}:{version}"
self.publish = publish
self.keep_container = keep_container

def _ansible_run_context(self) -> AnsibleRunContext:
extra_vars = {
"docker_container": self.container_name,
}
return AnsibleRunContext(
playbook="dss_docker_playbook.yml",
extra_vars=extra_vars,
)

def _ansible_config(self) -> ConfigObject:
return ConfigObject(
time_to_wait_for_polling=0.1,
slc_version=SLC_VERSION,
)

def _docker_file(self) -> importlib_resources.abc.Traversable:
return (
importlib_resources
.files("exasol.ds.sandbox.lib.dss_docker")
.joinpath("Dockerfile")
)

def create(self):
docker_file = self._docker_file()
try:
start = datetime.now()
docker_client = docker.from_env()
_logger.info(f"Creating docker image {self.image_name} from {docker_file}")
with docker_file.open("rb") as fileobj:
docker_client.images.build(fileobj=fileobj, tag=self.image_name)
container = docker_client.containers.create(
image=self.image_name,
name=self.container_name,
command="sleep infinity",
detach=True,
)
_logger.info("Starting container")
container.start()
_logger.info("Installing dependencies")
run_install_dependencies(
AnsibleAccess(),
configuration=self._ansible_config(),
host_infos=tuple(),
ansible_run_context=self._ansible_run_context(),
ansible_repositories=ansible_repository.default_repositories,
)
_logger.info("Committing changes to docker container")
image = container.commit(
repository=self.image_name,
)
except Exception as ex:
raise ex
finally:
if self.keep_container:
_logger.info("Keeping container running")
else:
_logger.info("Stopping container")
container.stop()
_logger.info("Removing container")
container.remove()
size = humanfriendly.format_size(image.attrs["Size"])
elapsed = pretty_print.elapsed(start)
_logger.info(f"Built Docker image {self.image_name} size {size} in {elapsed}.")
1 change: 1 addition & 0 deletions exasol/ds/sandbox/lib/logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ class LogType(Enum):
SETUP_CI_CODEBUILD = "setup_ci_codebuild"
AWS_ACCESS = "aws_access"
ANSIBLE = "ansible"
DOCKER_IMAGE = "docker_image"
CREATE_VM = "create_vm"
SETUP_RELEASE_BUILD = "setup_release_build"
RELEASE_BUILD = "release_build"
Expand Down
8 changes: 8 additions & 0 deletions exasol/ds/sandbox/lib/pretty_print.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
from datetime import datetime, timedelta


def elapsed(start: datetime, round_to_seconds=True) -> str:
d = datetime.now() - start
if round_to_seconds:
d = d - timedelta(microseconds=d.microseconds)
return str(d)
23 changes: 23 additions & 0 deletions exasol/ds/sandbox/runtime/ansible/dss_docker_playbook.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
- name: Prepare environment
hosts: localhost
gather_facts: false
vars:
ansible_python_interpreter: python3
tasks:
- name: Add docker container to inventory
add_host:
name: "{{docker_container}}"
groups: docker_container_group
ansible_connection: docker

- name: Setup DSS Docker Container
hosts: docker_container_group
gather_facts: false
vars:
ansible_python_interpreter: python3
user_name: root
user_home: /root
need_sudo: false
docker_integration_test: true
tasks:
- import_tasks: general_setup_tasks.yml
6 changes: 6 additions & 0 deletions exasol/ds/sandbox/runtime/ansible/ec2_setup_tasks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
- name: Install Script_languages
include_role:
name: script_languages
- name: Update netplan
include_role:
name: netplan
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,14 @@
become: "{{need_sudo}}"
- name: Install Poetry
include_role:
name: poetry
name: poetry
- name: Install Jupyter
include_role:
name: jupyter
name: jupyter
- name: Clear pip cache
ansible.builtin.file:
path: /root/.cache/pip
state: absent
- name: Install Docker
include_role:
name: docker
- name: Install Script_languages
include_role:
name: script_languages
- name: Update netplan
include_role:
name: netplan

name: docker
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
uncertainties==3.1.7
numpy==1.23.1
pandas==1.4.3
exasol-notebook-connector==0.1.0
exasol-notebook-connector==0.2.0
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

- name: Setup Jupyter
block:
- name: Install dependant apt packages
- name: Install dependent apt packages
apt:
name: "{{apt_dependencies}}"
state: present
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---

- name: Install dependant apt packages
- name: Install dependent apt packages
apt:
name: "{{apt_dependencies}}"
state: present
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---

- name: Install dependant apt packages
- name: Install dependent apt packages
apt:
name: "{{apt_dependencies}}"
state: present
Expand Down
3 changes: 2 additions & 1 deletion exasol/ds/sandbox/runtime/ansible/slc_setup.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@
need_sudo: yes
remote_user: ubuntu
tasks:
- import_tasks: slc_setup_tasks.yml
- import_tasks: general_setup_tasks.yml
- import_tasks: ec2_setup_tasks.yml
Loading

0 comments on commit f83662f

Please sign in to comment.