Skip to content
This repository has been archived by the owner on Jul 5, 2022. It is now read-only.

Automated testing for Katacoda scenarios with Docker #49

Closed
8 tasks done
iesahin opened this issue Mar 18, 2021 · 6 comments · Fixed by #54
Closed
8 tasks done

Automated testing for Katacoda scenarios with Docker #49

iesahin opened this issue Mar 18, 2021 · 6 comments · Fixed by #54

Comments

@iesahin
Copy link
Contributor

iesahin commented Mar 18, 2021

It's possible to use Docker images for each scenario to test the scenario commands.

  • We can create a dvc-base image like in https://hub.docker.com/repository/docker/emresult/dvc-base

  • For each of the scenarios, create an image, e.g., dvc-gs-versioning that will have all the artifacts / requirements to run the scenario. These can be linked in GS / UC / Tutorial pages for users to try the commands on their own machine.

  • We can have a script to extract shell commands from .md files. Rundoc may be suitable or we can find another tool, like an MD parser to send these commands to the Docker instance. (v2: It can also fill the parts in .md files with the output of the commands.)

Katacoda scenarios, tutorials, and the GS pages will have the same datasets, scripts, commands, etc.

Update - 2021-04-06

I have completed to write a tool that runs the commands in Markdown documents. It can be found here

It runs commands in Dockerized scenarios in the containers.

This is related with iterative/dvc.org#2318 and iterative/dvc.org#2354

This issue can be closed after merging the dockerized versions.

TODO

  • Create a dvc-base container that contains common requirements for the documentation
  • Create Dockerfile & scripts for [initialization][init]
  • Create Dockerfile for [versioning]
  • Create Dockerfile for [accessing]
  • Create Dockerfile for [stages]
  • Create Dockerfile for [params]
  • Create Dockerfile for [experiments]
  • Write a script to send {{execute}} blocks to containers via docker exec and get the results
@iesahin
Copy link
Contributor Author

iesahin commented Mar 22, 2021

@jorgeorpinel I'm currently building the Docker based scenarios under my account: https://katacoda.com/iex/courses/get-started-dockerized/versioning

Currently these Docker containers are stored in Docker hub under my name: https://hub.docker.com/repository/docker/emresult/dvc-gs-versioning

And their Dockerfiles are in https://github.com/iesahin/dvc-examples-docker

This repository will also contain all assets and project files for each of the scenarios.

  1. I want to transfer ownership of the Dockerfile repository.
  2. I need permissions to push to https://hub.docker.com/u/dvcorg
  3. I' plan to use tags like dvcorg/gs-versioning for GS and scenario containers, dvcorg/tut-mnist for tutorials, and dvcorg/example-shared-server for examples. I can also use URL tails in dvc.org, e.g., doc/start/data-versioning becomes dvcorg/doc-start-data-versioning but it looks too verbose. WDYT?
  4. I plan to update all scenarios to the Docker-based ones, in a single PR with a squashed commit. I can keep older versions around for reference in a different folder, but as I'll make no content changes, I don't think it's necessary.

@shcheklein

@shcheklein
Copy link
Member

I'm currently building the Docker based scenarios under my account: https://katacoda.com/iex/courses/get-started-dockerized/versioning

Looks great! Quick ask - let's add something at the very beginning, like please wait for a minute while we are initializing the environment. Otherwise it's not even clear for people not familiar with Docker what's going on there. Also, is there a way to cache something on the Katacoda end? or even on our end (S3) to make download faster?

This repository will also contain all assets and project files for each of the scenarios.

can we reuse/pull from the existing projects (like the example-get-started one?) - it can become hard to maintain, update multiple repos with code/assets

And their Dockerfiles are in https://github.com/iesahin/dvc-examples-docker

we should move them in katacoda repo? and probably enable CI/CD to upload the new image as we change these Dockerfiles, or when DVC version is being updated, etc - how are we going to detect such changes? how and when are we going to update/rebuild images?

I need permissions to push to https://hub.docker.com/u/dvcorg

Yep, I'll invite you to the org.

@iesahin
Copy link
Contributor Author

iesahin commented Mar 23, 2021

Thank you.

Quick ask - let's add something at the very beginning, like please wait for a minute while we are initializing the environment.

Yes, I'll put that. Overall, I think it's faster than manually preparing the environment and the user will have an option to run the command on their own container after a docker run -it dvcorg/gs-versioning. I'll put these instructions on the start pages.

I don't think we can make the downloads faster without paid subscription to Katacoda, the speed is most likely determined by the Katacoda's download speed.

While writing this, I just had an idea to base the containers on ubuntu:2004 environment Katacoda is using, maybe that helps but I prefer the easier maintenance on our part to faster downloads. Currently, all GS containers are derived from a common dvc-base, which in turn derived from python:3.7 container, which depends to Debian 10.8.

When the minimum requirements change, we'll only update the base container and test all the scenarios and documentation against this.

can we reuse/pull from the existing projects (like the example-get-started one?) - it can become hard to maintain, update multiple repos with code/assets

As we'll have common projects across all the documentation, all containers will use the same code base, by cloning from Github. Current setup is just to convert the scenarios to Dockerized versions. We need custom code for Katacoda for now, but I agree that it becomes a burden to have multiple example projects.

we should move them in katacoda repo? and probably enable CI/CD to upload the new image as we change these Dockerfiles, or when DVC version is being updated, etc - how are we going to detect such changes? how and when are we going to update/rebuild images?

I think we can have a Dockerfile repository that contains all the container definitions. This repository can have a similar structure to dvc.org repo. Any UC/UG/GS/cmdref page and Katacoda scenario can have a Dockerfile in this repository and we can test the commands on pages with these and push the containers to the hub. I don't think we'll need separate Katacoda containers, these should be identical to GS/UC containers but for the time being, I had to start somewhere.

I envision an automated setup, like when you update the project in example-get-started repository with some new code, all the containers that use the repo will be updated and all the documentation pages will be tested against the newer version of the project. So if someone made a typo somewhere, we'll notice it before anyone else.

This can be integrated to CI/CD as well. It probably will take a long time to test all the commands in the documentation, so a weekly setup may be preferable but automation is a major goal.

We can also store Dockerfiles to dvc.org repo and add some automated testing (similar to broken links) with these. (i.e. if a folder has a Dockerfile, build it and test the changed file's commands in this container.) In that case, we need a basic setup (dvc-base) somewhere and all other containers will live among the documentation pages. Mixing docs and Docker this way doesn't feel right to me but you may think it's all right.

Thank you @shcheklein

@iesahin
Copy link
Contributor Author

iesahin commented Mar 27, 2021

I changed the way example-get-started project used in Katacoda.

RUN git clone \
    https://github.com/iterative/example-get-started \
    --branch 7-ml-pipeline \ 
    && git -C /root/example-get-started \
    checkout -b katacoda-project

and modifies the data with

RUN dvc pull \
    && head -n 12000 data/data.xml > data/data.xml.1  \
    && mv data/data.xml.1 data/data.xml \
    && dvc add data/data.xml \
    && git add . \
    && git commit -m "Modified for Katacoda Params Scenario" \
    && dvc gc -f --workspace

For the general example-get-started containers, I'll just remove the second part. (Or modify it to have only a dvc.pull) This way, any modifications in example-get-started repo will be reflected in the containers without modification of Katacoda scenarios.

@shcheklein @jorgeorpinel

@iesahin
Copy link
Contributor Author

iesahin commented Mar 27, 2021

The experiments scenario also works on Docker locally, but there is a bug in DVC 2.0.6 installed by default using apt and RAM limitations seems to hit harder. It's a bit random, some experiments run sometimes and fail other times so I gave up fixing. Reducing the dataset size further may work but newer Python packages may be the culprit too.

You can play with the current containers like

docker run -it emresult/katacoda-gs-stages -p 8000:80
or
docker run -it emresult/katacoda-gs-params -p 8000:80

A web server (python3 -m http.server) is running in the background at port 80 of the container to show plots, images, etc.

You can compare dockerized versions of scenarios and former scenarios. There are no content changes.

The Dockerfile repository is here: https://github.com/iesahin/dvc-examples-docker

All Docker images are built and pushed using https://github.com/iesahin/dvc-examples-docker/blob/master/build-all.zsh

I'm beginning to write automated testing script for the scenarios.

@shcheklein @jorgeorpinel

@iesahin
Copy link
Contributor Author

iesahin commented Apr 6, 2021

I have completed the initial version for Markdown Code Runner here: https://github.com/iterative/markdown-code-runner

If you clone, run pip install -r requirements.txt and run test/run-katacoda-docs.zsh, it will clone the Katacoda scenarios to temporary directory, and execute all commands in the MD documents within their containers. (If you have docker.)

@shcheklein @jorgeorpinel

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants