Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/#30 build the data science sandbox as docker image #40

Merged
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions .github/workflows/check_ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,11 @@ jobs:
uses: ./.github/actions/prepare_poetry_env

- name: Run pytest
run: poetry run pytest test/ci/test_install_dependencies.py
run: >
poetry run pytest
test/unit
test/integration/test_create_dss_docker_image.py
env: # Set the secret as an env variable
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_ACCESS_KEY_SECRET }}
AWS_DEFAULT_REGION: ${{ secrets.AWS_REGION }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_ACCESS_KEY_SECRET }}
AWS_DEFAULT_REGION: ${{ secrets.AWS_REGION }}
1 change: 1 addition & 0 deletions doc/changes/changes_0.1.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Version: 0.1.0

- #11: Created a notebook to show training with scikit-learn in the notebook
- #15: Installed exasol-notebook-connector via ansible
- #30: Added script to build the Data Science Sandbox as Docker Image

## Bug Fixes

Expand Down
11 changes: 6 additions & 5 deletions doc/developer_guide/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,11 @@ A CLI command has normally a respective function in the `lib` submodule. Hence,

There are generally three types of commands:

| Type | Explanation |
| ----- | --------- |
| Release Commands | used during the release |
| Deployment Commands | used to deploy infrastructure onto AWS cloud |
| Development Commands | used to identify problems or for testing |
| Type | Explanation |
|----------------------|----------------------------------------------|
| Release Commands | used during the release |
| Deployment Commands | used to deploy infrastructure onto AWS cloud |
| Development Commands | used to identify problems or for testing |

### Release commands

Expand Down Expand Up @@ -71,6 +71,7 @@ The following commands can be used to deploy the infrastructure onto a given AWS
- `setup-vm-bucket` - deploys the AWS Bucket cloudformation stack which will be used to deploy the VM images
- `setup-release-codebuild` - deploys the AWS Codebuild cloudformation stack which will be used for the release-build
- `setup-vm-bucket-waf` - deploys the AWS Codebuild cloudformation stack which contains the WAF Acl configuration for the Cloudfront distribution of the VM Bucket
- `create-docker-image` - creates a Docker image for data-science-sandbox and deploys it to hub.docker.com/exasol/data-science-sandbox

## Flow

Expand Down
1 change: 1 addition & 0 deletions exasol/ds/sandbox/cli/commands/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@
from .update_release import update_release
from .make_ami_public import make_ami_public
from .setup_vm_bucket_waf import setup_vm_bucket_waf
from .create_docker_image import create_docker_image
44 changes: 44 additions & 0 deletions exasol/ds/sandbox/cli/commands/create_docker_image.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
import click

from exasol.ds.sandbox.cli.cli import cli
from exasol.ds.sandbox.cli.options.logging import logging_options
from exasol.ds.sandbox.cli.common import add_options
from exasol.ds.sandbox.lib.dss_docker import DssDockerImage
from exasol.ds.sandbox.lib.logging import SUPPORTED_LOG_LEVELS
from exasol.ds.sandbox.lib.logging import set_log_level


@cli.command()
@add_options([
click.option(
'--repository', type=str, metavar="ORG/REPO", show_default=True,
default="exasol/data-science-sandbox",
help="Organization and repository on hub.docker.com to publish the docker image to"),
click.option('--version', type=str, help="Docker image version tag"),
click.option(
'--publish', type=bool, is_flag=True,
help="Whether to publish the created Docker image"),
click.option(
'--keep-container', type=bool, is_flag=True,
help="""Keep the Docker Container running after creating the image.
Otherwise stop and remove the container."""),
])
@add_options(logging_options)
def create_docker_image(
repository: str,
version: str,
publish: bool,
keep_container: bool,
log_level: str,
):
"""
Create a Docker image for data-science-sandbox and deploy
it to a Docker repository.
"""
set_log_level(log_level)
DssDockerImage(
repository=repository,
version=version,
publish=publish,
keep_container=keep_container,
).create()
9 changes: 6 additions & 3 deletions exasol/ds/sandbox/lib/ansible/ansible_access.py
Original file line number Diff line number Diff line change
@@ -1,24 +1,27 @@
from typing import Callable

import ansible_runner
import logging

from typing import Callable

from exasol.ds.sandbox.lib.ansible.ansible_run_context import AnsibleRunContext
from exasol.ds.sandbox.lib.logging import get_status_logger, LogType


class AnsibleException(RuntimeError):
pass


class AnsibleAccess:

"""
Provides access to ansible runner.
@raises: AnsibleException if ansible execution fails
"""
@staticmethod
def run(private_data_dir: str, run_ctx: AnsibleRunContext, printer: Callable[[str], None]):
quiet = not get_status_logger(LogType.ANSIBLE).isEnabledFor(logging.INFO)
r = ansible_runner.run(private_data_dir=private_data_dir,
playbook=run_ctx.playbook,
quiet=quiet,
extravars=run_ctx.extra_vars)
for e in r.events:
printer(e)
Expand Down
2 changes: 1 addition & 1 deletion exasol/ds/sandbox/lib/ansible/ansible_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ def __init__(self, ansible_access: AnsibleAccess, work_dir: Path):

@staticmethod
def printer(msg: str):
LOG.info(msg)
LOG.debug(msg)

def run(self, ansible_run_context: AnsibleRunContext, host_infos: Tuple[HostInfo]):
inventory_content = render_template("inventory.jinja", host_infos=host_infos)
Expand Down
4 changes: 4 additions & 0 deletions exasol/ds/sandbox/lib/dss_docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
FROM ubuntu:20.04
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install --no-install-recommends --assume-yes python3 python3-pexpect
EXPOSE 8888/tcp
1 change: 1 addition & 0 deletions exasol/ds/sandbox/lib/dss_docker/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .create_image import DssDockerImage
102 changes: 102 additions & 0 deletions exasol/ds/sandbox/lib/dss_docker/create_image.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
import docker
import humanfriendly
import importlib_resources

from datetime import datetime
from docker.types import Mount
from exasol.ds.sandbox.lib import pretty_print
from importlib_metadata import version
from pathlib import Path

from exasol.ds.sandbox.lib.config import ConfigObject, SLC_VERSION
from exasol.ds.sandbox.lib.logging import get_status_logger, LogType
from exasol.ds.sandbox.lib.ansible import ansible_repository
from exasol.ds.sandbox.lib.ansible.ansible_run_context import AnsibleRunContext
from exasol.ds.sandbox.lib.ansible.ansible_access import AnsibleAccess
from exasol.ds.sandbox.lib.setup_ec2.run_install_dependencies import run_install_dependencies


DSS_VERSION = version("exasol-data-science-sandbox")


class DssDockerImage:
@classmethod
def timestamp(cls) -> str:
return f'{datetime.now().timestamp():.0f}'

def __init__(
self,
repository: str,
version: str = None,
publish: bool = False,
keep_container: bool = False,
):
version = version if version else DSS_VERSION
self.container_name = f"ds-sandbox-{DssDockerImage.timestamp()}"
self.image_name = f"{repository}:{version}"
self.publish = publish
self.keep_container = keep_container

def _ansible_run_context(self) -> AnsibleRunContext:
extra_vars = {
"docker_container": self.container_name,
}
return AnsibleRunContext(
playbook="dss_docker_playbook.yml",
extra_vars=extra_vars,
)

def _ansible_config(self) -> ConfigObject:
return ConfigObject(
time_to_wait_for_polling=0.1,
slc_version=SLC_VERSION,
ckunki marked this conversation as resolved.
Show resolved Hide resolved
)

def _docker_file(self) -> Path:
return (
importlib_resources
ckunki marked this conversation as resolved.
Show resolved Hide resolved
.files("exasol.ds.sandbox.lib.dss_docker")
.joinpath("Dockerfile")
)

def create(self):
logger = get_status_logger(LogType.DOCKER_IMAGE)
ckunki marked this conversation as resolved.
Show resolved Hide resolved
docker_file = self._docker_file()
try:
start = datetime.now()
docker_client = docker.from_env()
logger.info(f"Creating docker image {self.image_name} from {docker_file}")
docker_client.images.build(path=str(docker_file.parent), tag=self.image_name)
container = docker_client.containers.create(
image=self.image_name,
name=self.container_name,
command="sleep infinity",
detach=True,
)
logger.info("Starting container")
container.start()
logger.info("Installing dependencies")
run_install_dependencies(
AnsibleAccess(),
configuration=self._ansible_config(),
host_infos=tuple(),
ansible_run_context=self._ansible_run_context(),
ansible_repositories=ansible_repository.default_repositories,
)
logger.info("Committing changes to docker container")
image = container.commit(
repository=self.image_name,
)
except Exception as ex:
raise ex
finally:
if self.keep_container:
logger.info("Keeping container running")
else:
logger.info("Stopping container")
container.stop()
logger.info("Removing container")
container.remove()
size = humanfriendly.format_size(image.attrs["Size"])
elapsed = pretty_print.elapsed(start)
logger.info(f"Built Docker image {self.image_name} size {size} in {elapsed}.")
1 change: 1 addition & 0 deletions exasol/ds/sandbox/lib/logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ class LogType(Enum):
SETUP_CI_CODEBUILD = "setup_ci_codebuild"
AWS_ACCESS = "aws_access"
ANSIBLE = "ansible"
DOCKER_IMAGE = "docker_image"
CREATE_VM = "create_vm"
SETUP_RELEASE_BUILD = "setup_release_build"
RELEASE_BUILD = "release_build"
Expand Down
8 changes: 8 additions & 0 deletions exasol/ds/sandbox/lib/pretty_print.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
from datetime import datetime, timedelta


def elapsed(start: datetime, round_to_seconds=True) -> str:
ckunki marked this conversation as resolved.
Show resolved Hide resolved
d = datetime.now() - start
if round_to_seconds:
d = d - timedelta(microseconds=d.microseconds)
return str(d)
23 changes: 23 additions & 0 deletions exasol/ds/sandbox/runtime/ansible/dss_docker_playbook.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
- name: Prepare environment
hosts: localhost
gather_facts: false
vars:
ansible_python_interpreter: python3
tasks:
- name: Add docker container to inventory
add_host:
name: "{{docker_container}}"
groups: docker_container_group
ansible_connection: docker

- name: Setup DSS Docker Container
hosts: docker_container_group
gather_facts: false
vars:
ansible_python_interpreter: python3
user_name: root
user_home: /root
need_sudo: false
docker_integration_test: true
tasks:
- import_tasks: general_setup_tasks.yml
6 changes: 6 additions & 0 deletions exasol/ds/sandbox/runtime/ansible/ec2_setup_tasks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
- name: Install Script_languages
include_role:
name: script_languages
- name: Update netplan
include_role:
name: netplan
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,14 @@
become: "{{need_sudo}}"
- name: Install Poetry
include_role:
name: poetry
name: poetry
- name: Install Jupyter
include_role:
name: jupyter
name: jupyter
- name: Clear pip cache
ansible.builtin.file:
path: /root/.cache/pip
state: absent
- name: Install Docker
include_role:
name: docker
- name: Install Script_languages
include_role:
name: script_languages
- name: Update netplan
include_role:
name: netplan

name: docker
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
uncertainties==3.1.7
numpy==1.23.1
pandas==1.4.3
exasol-notebook-connector==0.1.0
exasol-notebook-connector==0.2.0
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

- name: Setup Jupyter
block:
- name: Install dependant apt packages
- name: Install dependent apt packages
apt:
name: "{{apt_dependencies}}"
state: present
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---

- name: Install dependant apt packages
- name: Install dependent apt packages
apt:
name: "{{apt_dependencies}}"
state: present
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---

- name: Install dependant apt packages
- name: Install dependent apt packages
apt:
name: "{{apt_dependencies}}"
state: present
Expand Down
3 changes: 2 additions & 1 deletion exasol/ds/sandbox/runtime/ansible/slc_setup.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@
need_sudo: yes
remote_user: ubuntu
tasks:
- import_tasks: slc_setup_tasks.yml
- import_tasks: general_setup_tasks.yml
- import_tasks: ec2_setup_tasks.yml
Loading
Loading