Skip to content

Commit

Permalink
feat(ingest): feast - add support for Feast 0.18, deprecate older int…
Browse files Browse the repository at this point in the history
…egration (#4094)
  • Loading branch information
danilopeixoto authored Apr 26, 2022
1 parent 4b913f6 commit d2a6bc0
Show file tree
Hide file tree
Showing 23 changed files with 1,187 additions and 233 deletions.
3 changes: 2 additions & 1 deletion docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,8 @@ We use a plugin architecture so that you can install only the dependencies you a
| [datahub-business-glossary](../metadata-ingestion/source_docs/business_glossary.md) | _no additional dependencies_ | Business Glossary File source |
| [dbt](../metadata-ingestion/source_docs/dbt.md) | _no additional dependencies_ | dbt source |
| [druid](../metadata-ingestion/source_docs/druid.md) | `pip install 'acryl-datahub[druid]'` | Druid Source |
| [feast](../metadata-ingestion/source_docs/feast.md) | `pip install 'acryl-datahub[feast]'` | Feast source |
| [feast-legacy](../metadata-ingestion/source_docs/feast_legacy.md) | `pip install 'acryl-datahub[feast-legacy]'` | Feast source (legacy) |
| [feast](../metadata-ingestion/source_docs/feast.md) | `pip install 'acryl-datahub[feast]'` | Feast source (0.18.0) |
| [glue](../metadata-ingestion/source_docs/glue.md) | `pip install 'acryl-datahub[glue]'` | AWS Glue source |
| [hive](../metadata-ingestion/source_docs/hive.md) | `pip install 'acryl-datahub[hive]'` | Hive source |
| [kafka](../metadata-ingestion/source_docs/kafka.md) | `pip install 'acryl-datahub[kafka]'` | Kafka source |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
source:
type: "feast-repository"
config:
path: "/path/to/repository/"
environment: "PROD"
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
49 changes: 41 additions & 8 deletions metadata-ingestion/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,8 @@ def get_long_description():
# https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/release-notes.html#rn-7-14-0
# https://github.com/elastic/elasticsearch-py/issues/1639#issuecomment-883587433
"elasticsearch": {"elasticsearch==7.13.4"},
"feast": {"docker"},
"feast-legacy": {"docker"},
"feast": {"feast==0.18.0", "flask-openid>=1.3.0"},
"glue": aws_common,
"hive": sql_common
| {
Expand Down Expand Up @@ -322,10 +323,30 @@ def get_long_description():
),
}

base_dev_requirements_airflow_1 = base_dev_requirements.copy()

if is_py37_or_newer:
# The lookml plugin only works on Python 3.7 or newer.
# These plugins only work on Python 3.7 or newer.
base_dev_requirements = base_dev_requirements.union(
{dependency for plugin in ["lookml"] for dependency in plugins[plugin]}
{
dependency
for plugin in [
"feast",
"lookml",
]
for dependency in plugins[plugin]
}
)

# These plugins are compatible with Airflow 1.
base_dev_requirements_airflow_1 = base_dev_requirements_airflow_1.union(
{
dependency
for plugin in [
"lookml",
]
for dependency in plugins[plugin]
}
)

dev_requirements = {
Expand All @@ -340,19 +361,17 @@ def get_long_description():
"WTForms==2.3.3", # make constraint consistent with extras
}
dev_requirements_airflow_1 = {
*base_dev_requirements,
*base_dev_requirements_airflow_1,
*dev_requirements_airflow_1_base,
}

full_test_dev_requirements = {
*list(
dependency
for plugin in [
# Only include Athena for Python 3.7 or newer.
*(["athena"] if is_py37_or_newer else []),
"clickhouse",
"druid",
"feast",
"feast-legacy",
"hive",
"ldap",
"mongodb",
Expand All @@ -367,6 +386,19 @@ def get_long_description():
),
}

if is_py37_or_newer:
# These plugins only work on Python 3.7 or newer.
full_test_dev_requirements = full_test_dev_requirements.union(
{
dependency
for plugin in [
"athena",
"feast",
]
for dependency in plugins[plugin]
}
)

entry_points = {
"console_scripts": ["datahub = datahub.entrypoints:main"],
"datahub.ingestion.source.plugins": [
Expand All @@ -383,7 +415,8 @@ def get_long_description():
"dbt = datahub.ingestion.source.dbt:DBTSource",
"druid = datahub.ingestion.source.sql.druid:DruidSource",
"elasticsearch = datahub.ingestion.source.elastic_search:ElasticsearchSource",
"feast = datahub.ingestion.source.feast:FeastSource",
"feast-legacy = datahub.ingestion.source.feast_legacy:FeastSource",
"feast = datahub.ingestion.source.feast:FeastRepositorySource",
"glue = datahub.ingestion.source.aws.glue:GlueSource",
"sagemaker = datahub.ingestion.source.aws.sagemaker:SagemakerSource",
"hive = datahub.ingestion.source.sql.hive:HiveSource",
Expand Down
46 changes: 27 additions & 19 deletions metadata-ingestion/source_docs/feast.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,36 +2,45 @@

For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).

## Setup
This source is designed for Feast 10+ repositories.

As of version 0.10+, Feast has changed the architecture from a stack of services to SDK/CLI centric application. Please refer to [Feast 0.9 vs Feast 0.10+](https://docs.feast.dev/project/feast-0.9-vs-feast-0.10+) for further details.

For compatibility with pre-0.10 Feast, see [Feast Legacy](feast_legacy.md) source.

:::note

**Note: Feast ingestion requires Docker to be installed.**
This source is only compatible with Feast 0.18.0
:::

## Setup

To install this plugin, run `pip install 'acryl-datahub[feast]'`.

## Capabilities

This plugin extracts the following:

- List of feature tables (modeled as [`MLFeatureTable`](https://github.com/datahub-project/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/ml/metadata/MLFeatureTableProperties.pdl)s),
features ([`MLFeature`](https://github.com/datahub-project/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/ml/metadata/MLFeatureProperties.pdl)s),
and entities ([`MLPrimaryKey`](https://github.com/datahub-project/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/ml/metadata/MLPrimaryKeyProperties.pdl)s)
- Column types associated with each feature and entity
This plugin extracts:

Note: this uses a separate Docker container to extract Feast's metadata into a JSON file, which is then
parsed to DataHub's native objects. This separation was performed because of a dependency conflict in the `feast` module.
- Entities as [`MLPrimaryKey`](https://datahubproject.io/docs/graphql/objects#mlprimarykey)
- Features as [`MLFeature`](https://datahubproject.io/docs/graphql/objects#mlfeature)
- Feature views and on-demand feature views as [`MLFeatureTable`](https://datahubproject.io/docs/graphql/objects#mlfeaturetable)
- Batch and stream source details as [`Dataset`](https://datahubproject.io/docs/graphql/objects#dataset)
- Column types associated with each entity and feature

## Quickstart recipe

Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.

For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).

```yml
```yaml
source:
type: feast
type: "feast"
config:
# Coordinates
core_url: "localhost:6565"
path: "/path/to/repository/"
# Options
environment: "PROD"

sink:
# sink configs
Expand All @@ -41,15 +50,14 @@ sink:

Note that a `.` is used to denote nested fields in the YAML recipe.

| Field | Required | Default | Description |
| ----------------- | -------- | ------------------ | ------------------------------------------------------- |
| `core_url` | | `"localhost:6565"` | URL of Feast Core instance. |
| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. |
| `use_local_build` | | `False` | Whether to build Feast ingestion Docker image locally. |
| Field | Required | Default | Description |
| ------------- | -------- | ------- | ------------------------------------------ |
| `path` || | Path to Feast repository. |
| `environment` | | `PROD` | Environment to use when constructing URNs. |

## Compatibility

Coming soon!
This source is compatible with [Feast (==0.18.0)](https://github.com/feast-dev/feast/releases/tag/v0.18.0).

## Questions

Expand Down
63 changes: 63 additions & 0 deletions metadata-ingestion/source_docs/feast_legacy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Feast (Legacy)

For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).

This source is designed for Feast 0.9 core services.

As of version 0.10+, Feast has changed the architecture from a stack of services to SDK/CLI centric application. Please refer to [Feast 0.9 vs Feast 0.10+](https://docs.feast.dev/project/feast-0.9-vs-feast-0.10+) for further details.

See [Feast](feast.md) source for something compatible with the latest Feast versions.

## Setup

**Note: Feast ingestion requires Docker to be installed.**

To install this plugin, run `pip install 'acryl-datahub[feast-legacy]'`.

## Capabilities

This plugin extracts the following:

- Entities as [`MLPrimaryKey`](https://datahubproject.io/docs/graphql/objects#mlprimarykey)
- Features as [`MLFeature`](https://datahubproject.io/docs/graphql/objects#mlfeature)
- Feature tables as [`MLFeatureTable`](https://datahubproject.io/docs/graphql/objects#mlfeaturetable)
- Batch and stream source details as [`Dataset`](https://datahubproject.io/docs/graphql/objects#dataset)
- Column types associated with each entity and feature

Note: this uses a separate Docker container to extract Feast's metadata into a JSON file, which is then
parsed to DataHub's native objects. This separation was performed because of a dependency conflict in the `feast` module.

## Quickstart recipe

Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.

For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).

```yml
source:
type: feast-legacy
config:
# Coordinates
core_url: "localhost:6565"

sink:
# sink configs
```

## Config details

Note that a `.` is used to denote nested fields in the YAML recipe.

| Field | Required | Default | Description |
| ----------------- | -------- | ------------------ | ------------------------------------------------------- |
| `core_url` | | `"localhost:6565"` | URL of Feast Core instance. |
| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. |
| `use_local_build` | | `False` | Whether to build Feast ingestion Docker image locally. |

## Compatibility

This source is compatible with [Feast (0.10.5)](https://github.com/feast-dev/feast/releases/tag/v0.10.5) and earlier versions.

## Questions

If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
Loading

0 comments on commit d2a6bc0

Please sign in to comment.