Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Use poetry over pipenv #337

Merged
merged 8 commits into from
Apr 11, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/workflows/unit-tests-airflow1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,13 @@ jobs:
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install pipenv
run: pip install pipenv
- name: Install poetry
run: curl -sSL https://install.python-poetry.org | python3 - --preview
- name: Install dependencies
run: pipenv install --ignore-pipfile --dev
run: $HOME/.local/bin/poetry install --only pipelines
- name: Initialize Airflow
run: pipenv run airflow db init
run: poetry run airflow db init
- name: Setup Airflow 1.10 pipeline YAML config
run: cp samples/pipeline.airflow1.yaml samples/pipeline.yaml
- name: Run tests
run: pipenv run python -m pytest -v
run: poetry run python -m pytest -v tests
10 changes: 5 additions & 5 deletions .github/workflows/unit-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@ jobs:
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install pipenv
run: pip install pipenv
- name: Install poetry
run: curl -sSL https://install.python-poetry.org | python3 - --preview
- name: Install dependencies
run: pipenv install --ignore-pipfile --dev
run: poetry install --only pipelines
- name: Initialize Airflow
run: pipenv run airflow db init
run: poetry run airflow db init
- name: Run tests
run: pipenv run python -m pytest -v
run: poetry run python -m pytest -v tests
2 changes: 0 additions & 2 deletions CONTRIBUTORS

This file was deleted.

32 changes: 0 additions & 32 deletions Pipfile

This file was deleted.

3,130 changes: 0 additions & 3,130 deletions Pipfile.lock

This file was deleted.

29 changes: 15 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,34 +7,35 @@ Cloud-native, data pipeline architecture for onboarding public datasets to [Data
![public-datasets-pipelines](images/architecture.png)

# Requirements
- Python `>=3.6.10,<3.9`. We currently use `3.8`. For more info, see the [Cloud Composer version list](https://cloud.google.com/composer/docs/concepts/versioning/composer-versions).
- Familiarity with [Apache Airflow](https://airflow.apache.org/docs/apache-airflow/stable/concepts/index.html) (`>=v2.1.0`)
- [pipenv](https://pipenv-fork.readthedocs.io/en/latest/install.html#installing-pipenv) for creating similar Python environments via `Pipfile.lock`
- Python `>=3.8,<3.10`. We currently use `3.8`. For more info, see the [Cloud Composer version list](https://cloud.google.com/composer/docs/concepts/versioning/composer-versions).
- Familiarity with [Apache Airflow](https://airflow.apache.org/docs/apache-airflow/stable/concepts/index.html) (`>=v2.1.4`)
- [poetry](https://github.com/python-poetry/poetry) for installing and managing dependencies
- [gcloud](https://cloud.google.com/sdk/gcloud) command-line tool with Google Cloud Platform credentials configured. Instructions can be found [here](https://cloud.google.com/sdk/docs/initializing).
- [Terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli) `>=v0.15.1`
- [Google Cloud Composer](https://cloud.google.com/composer/docs/concepts/overview) environment running [Apache Airflow](https://airflow.apache.org/docs/apache-airflow/stable/concepts.html) `>=2.1.0` and Cloud Composer `>=2.0.0`. To create a new Cloud Composer environment, see [this guide](https://cloud.google.com/composer/docs/how-to/managing/creating).
- [Google Cloud Composer](https://cloud.google.com/composer/docs/concepts/overview) environment running [Apache Airflow](https://airflow.apache.org/docs/apache-airflow/stable/concepts.html) `>=2.1.0` and Cloud Composer `>=2.0`. To create a new Cloud Composer environment, see [this guide](https://cloud.google.com/composer/docs/how-to/managing/creating).

# Environment Setup

We use Pipenv to make environment setup more deterministic and uniform across different machines. If you haven't done so, install Pipenv using these [instructions](https://pipenv-fork.readthedocs.io/en/latest/install.html#installing-pipenv).
We use [Poetry](https://github.com/python-poetry/poetry) to make environment setup more deterministic and uniform across different machines. If you haven't done so, install Poetry using these [instructions](https://python-poetry.org/docs/master/#installation). We recommend using poetry's official installer.

With Pipenv installed, run the following command to install the dependencies:
Once Poetry is installed, run one of the following commands depending on your use case:

For data pipeline development
```bash
pipenv install --ignore-pipfile --dev
poetry install --only pipelines
```

This installs dependencies using the specific versions in the `Pipfile.lock` file (instead of the `Pipfile` file which is ignored via `--ignore-pipfile`).
This installs dependencies using the specific versions in the `poetry.lock` file.

Finally, initialize the Airflow database:

```bash
pipenv run airflow db init
poetry run airflow db init
```

To ensure you have a proper setup, run the tests:
```
pipenv run python -m pytest -v
poetry run python -m pytest -v tests
```

# Building Data Pipelines
Expand Down Expand Up @@ -84,7 +85,7 @@ Every YAML file supports a `resources` block. To use this, identify what Google

Run the following command from the project root:
```bash
pipenv run python scripts/generate_terraform.py \
poetry run python scripts/generate_terraform.py \
--dataset $DATASET \
--gcp-project-id $GCP_PROJECT_ID \
--region $REGION \
Expand Down Expand Up @@ -116,7 +117,7 @@ As a concrete example, the unit tests use a temporary `.test` directory as their
Run the following command from the project root:

```bash
pipenv run python scripts/generate_dag.py \
poetry run python scripts/generate_dag.py \
--dataset $DATASET \
--pipeline $PIPELINE \
[--all-pipelines] \
Expand Down Expand Up @@ -224,7 +225,7 @@ This step requires a Cloud Composer environment up and running in your Google Cl
To deploy the DAG and the variables to your Cloud Composer environment, use the command

```
pipenv run python scripts/deploy_dag.py \
poetry run python scripts/deploy_dag.py \
--dataset DATASET \
[--pipeline PIPELINE] \
--composer-env CLOUD_COMPOSER_ENVIRONMENT_NAME \
Expand All @@ -240,7 +241,7 @@ Specifying an argument to `--pipeline` is optional. By default, the script deplo
Run the unit tests from the project root as follows:

```
pipenv run python -m pytest -v
poetry run python -m pytest -v
```

# YAML Config Reference
Expand Down
58 changes: 0 additions & 58 deletions cloudbuild.yaml

This file was deleted.

Loading