How to Contribute

We'd love to accept your patches and contributions to this project. There are just a few small guidelines you need to follow.

Contributor License Agreement

Contributions to this project must be accompanied by a Contributor License Agreement. You (or your employer) retain the copyright to your contribution; this simply gives us permission to use and redistribute your contributions as part of the project. Head over to https://cla.developers.google.com/ to see your current agreements on file or to sign a new one.

You generally only need to submit a CLA once, so if you've already submitted one (even if it was for a different project), you probably don't need to do it again.

Code reviews

All submissions, including submissions by project members, require review. We use GitHub pull requests for this purpose. Consult GitHub Help for more information on using pull requests.

Community Guidelines

See Code of Conduct

Architecture of the solution

The latest documentation about design of Oozie To Airflow converter can be found in The Design Document. Please take a look to understand how the conversion process works.

Local development environment

You can easily setup your local environment to modify the code and run tests and conversions. The unit tests and conversion can be all run locally and they do not require Oozie-enabled cluster nor running Apache Airflow instance.

Local environment setup

The environment can be setup via the virtualenv setup. You can easily create such virtualenv using virtualenvwrapper.

An example of such local environment setup (with virtualenvwrapper):

mkvirtualenv -p python3.8 oozie-to-airflow
pip install -e .

Then later you can switch to such virtualenv by running:

workon oozie-to-airflow

After installing o2a with pip install -e . you will have o2a converter added to your path and your local sources will be installed via symbolic links. You simply install a project in editable mode (i.e. setuptools "develop mode") from a local project path.

While in your virtualenv, you can re-install all the requirements via pip install -r requirements.txt or pip install -e . to repeat "develop mode" installation.

You can also separately add the bin subdirectory to your PATH, then all the scripts described later in the documentation can be run without adding ./bin prefix. This can be done for example by adding similar line to your .bash_profile or bin/postactivate from your virtual environment:

export PATH=${PATH}:<INSERT_PATH_TO_YOUR_OOZIE_PROJECT>/bin

Otherwise you need to run all the scripts from the bin subdirectory, for example:

./bin/o2a --help

In all the example commands below it is assumed that the bin directory is in your PATH.

Static code analysis and pre-commit hooks

We are using a number of checks for quality checks of the code. They are verified during Travis build but also you can install:

Pre-commit hook by running:

pre-commit install

Pre-push hook by running:

pre-commit install --hook-type pre-push

You can also run all the checks manually by running:

pre-commit run --all-files

You might need to install xmllint and docker if you do not have it locally. The first can be done with apt install libxml2-utils on Linux or brew install xmlstarlet on MacOS. The second can be done according to the instructions.

You can always skip running the tests by providing --no-verify flag to git commit command.

You can check all commands of pre-commit framework at https://pre-commit.com/

Running Unit Tests

While you are in your local virtualenv, you can run the unit tests. Currently, the test directory is set up in a such a way that the folders in tests directory mirrors the structure of the o2a directory.

Unit tests are run automatically in Travis CI and when you have pre-commit hooks installed. You can also run all unit tests using o2a-run-all-unit-tests script.

Running all example conversions

All example conversions can by run via the o2a-run-all-conversions script. It is also executed during automated tests.

Dependency graphs

You can generate dependency graphs automatically from the code via o2a-generate-dependency-graph but you need graphviz installed locally.

The latest dependencies generated:

You can also see dependency cycles in case there are some cycles in o2a-dependency-cycles.png

Continuous integration environment

The project integrates with Travis CI. To enable saving of the build process artifacts, you must configure the authorization mechanisms for Google Cloud Storage. For this purpose, it is necessary to set two environment variables: GCP_SERVICE_ACCOUNT, GCP_BUCKET_NAME.

To do this, follow these steps:

To simplify the instructions, set the environment variable:

export PROJECT_ID="$(gcloud config get-value project)"
export ACCOUNT_NAME=o2a-build-artifacts-travis-ci
export ACCOUNT_EMAIL="${ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"
export BUCKET_NAME=o2a-build-artifacts

Create the service account that will be used by Travis

gcloud iam service-accounts create "${ACCOUNT_NAME}"

Create a new private key for the service account, and save a copy of it in the o2a-build-artifacts-sa.json file.

gcloud iam service-accounts keys create --iam-account "${ACCOUNT_EMAIL}" o2a-build-artifacts-sa.json

Create the bucket

gsutil mb "gs://${BUCKET_NAME}"

Enables the Bucket Policy Only feature on Cloud Storage bucket

gsutil bucketpolicyonly set on "gs://${BUCKET_NAME}"

Grant permission to make a bucket's objects publicly readable:

gsutil iam ch allUsers:objectViewer "gs://${BUCKET_NAME}"

Grant permission to create and overwrite a bucket's objects by service account:

gsutil iam ch "serviceAccount:${ACCOUNT_EMAIL}:objectAdmin" "gs://${BUCKET_NAME}"

Set environement variable on Travis CI

travis env set GCP_SERVICE_ACCOUNT "$(cat o2a-build-artifacts-sa.json)" --private
travis env set GCP_BUCKET_NAME "${BUCKET_NAME}" --public

Remove a service account from local disk

rm o2a-build-artifacts-sa.json

Running system tests

System tests

Oozie to Airflow has a set of system tests that test end-2-end functionality of conversion and execution of workflows using Cloud environment with Cloud Dataproc and Cloud Composer as described in the README.md

We can run examples defined in the examples folder as system tests. The system tests use an existing Composer, Dataproc cluster and Oozie run in the Dataproc cluster to prepare HDFS application folder structure and trigger the tests automatically.

You can run the tests using this command:

o2a-run-sys-tests --application <APPLICATION> --phase <PHASE>

Default phase is convert - it only converts the oozie workflow to Airflow DAG without running the tests on either Oozie nor Composer

When you run the script with --help you can see all the options. You can setup autocomplete with -A option - this way you do not have to remember all the options.

Current options:

Usage: o2a-run-sys-test [FLAGS] [-A|-S|-K|-W]

Executes prepare or run phase for integration testing of O2A converter.

Flags:

-h, --help
        Shows this help message.

-a, --application <APPLICATION>
        Application (from examples dir) to run the tests on. Must be specified unless -S or -A are specified.
        One of [childwf decision demo el fs git mapreduce pig shell spark ssh subwf]

-p, --phase <PHASE>
        Phase of the test to run. One of [prepare-configuration convert prepare-dataproc test-composer test-oozie test-compare-artifacts]. Defaults to convert.

-C, --composer-name <COMPOSER_NAME>
        Composer instance used to run the operations on. Defaults to o2a-integration

-L, --composer-location <COMPOSER_LOCATION>
        Composer locations. Defaults to europe-west1

-c, --cluster <CLUSTER>
        Cluster used to run the operations on. Defaults to oozie-51

-b, --bucket <BUCKET>
        Airflow Composer DAG bucket used. Defaults to bucket that is used by Composer.

-r, --region <REGION>
        GCP Region where the cluster is located. Defaults to europe-west3

-v, --verbose
        Add even more verbosity when running the script.

-d, --dot
        Creates files in the DOT representation.
        If you have the graphviz program in PATH, the files will also be converted to the PNG format.
        If you have the graphviz program and the imgcat programs in PATH, the files will also be displayed in the console

Optional commands to execute:

-K, --ssh-to-composer-worker
        Open shell access to Airflow's worker. This allows you to test commands in the context of the Airflow instance.
        It is worth noting that it is possible to access the database.
        The kubectl exec command is used internally, so not all SSH features are available.

-S, --ssh-to-dataproc-master
        SSH to Dataproc's cluster master. All SSH features are available by this options.
        Arguments after -- are passed to gcloud compute ssh command as extra args.

-W, --open-oozie-web-ui
        Creates a SOCKS5 proxy server that redirects traffic through Dataproc's cluster master and
        opens Google Chrome with a proxy configuration and a tab with the Oozie web interface.

-A, --setup-autocomplete
        Sets up autocomplete for o2a-run-sys-tests

Caching latest used parameters by run-sys-test

You do not need to specify the parameters once you run the script with your chosen flags. The latest parameters used are stored and cached locally in .ENVIRONMENT_NAME files in .o2a-run-sys-test-cache-dir and used next time when you run the script.

In case you want to clean up the cache, simply remove all the files from that directory.

Test phases

The following phases are defined for the system tests:

prepare-configuration - prepares configuration based on passed Dataproc/Composer parameters
convert - converts the example application workflow to DAG and stores it in output/<APPLICATION> directory
prepare-dataproc - prepares Dataproc cluster to execute both Composer and Oozie jobs. The preparation is:
- Local filesystem: ${HOME}/o2a/<APPLICATION> directory contains application to be uploaded to HDFS
- Local filesystem: ${HOME}/o2a/<APPLICATION>.properties property file to run the oozie job
- HDFS: /user/${user.name}/examples/apps/ - the application is stored in this HDFS directory
test-composer - runs tests on Composer instance. Artifacts are downloaded to the output-artifacts/<APPLICATION>/composer directory.
test-oozie - runs tests on Oozie in Hadoop cluster. Artifacts are downloaded to the output-artifacts/<APPLICATION>/oozie directory.
test-compare-artifacts - run tests on Oozie and Composer instance and displays a comparison of artifact differences.

Test scenarios

The typical scenario to run the tests are:

Running application via Oozie:

o2a-run-sys-test --phase prepare-dataproc --application <APP> --cluster <CLUSTER>

o2a-run-sys-test --phase test-oozie

Running application via composer:

o2a-run-sys-test --phase prepare-dataproc --application <APP> --cluster <CLUSTER>

o2a-run-sys-test --phase test-composer

Running system tests with sub-workflows

In order to run system tests with sub-workflows you need to have the sub-workflow application already present in HDFS, therefore you need to run at least o2a-run-sys-test --phase prepare-dataproc --application <SUBWORKFLOW_APP>

For example in case of the demo application, you need to run at least once o2a-run-sys-test --phase prepare-dataproc --application childwf because childwf is used as sub-workflow in the demo application.

Packaging the application and uploading to PyPi

In order to upload a new version to PyPi you need to have the appropriate credentials. There are scripts that package the application and upload it to the test or to the production PyPi instance:

o2a-package-upload-test - prepares and uploads the prepared package to the test PyPi
o2a-package-upload - prepares and uploads the prepared package to the production PyPi

Make sure to update the version of the package in setup.py before preparing/updating.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CONTRIBUTING.md

CONTRIBUTING.md

Table of Contents

How to Contribute

Contributor License Agreement

Code reviews

Community Guidelines

Architecture of the solution

Local development environment

Local environment setup

Static code analysis and pre-commit hooks

Running Unit Tests

Running all example conversions

Dependency graphs

Continuous integration environment

Running system tests

System tests

Caching latest used parameters by run-sys-test

Test phases

Test scenarios

Running system tests with sub-workflows

Packaging the application and uploading to PyPi

Files

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Table of Contents

How to Contribute

Contributor License Agreement

Code reviews

Community Guidelines

Architecture of the solution

Local development environment

Local environment setup

Static code analysis and pre-commit hooks

Running Unit Tests

Running all example conversions

Dependency graphs

Continuous integration environment

Running system tests

System tests

Caching latest used parameters by run-sys-test

Test phases

Test scenarios

Running system tests with sub-workflows

Packaging the application and uploading to PyPi