-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(integration/fivetran): Fivetran connector integration #9018
Changes from 21 commits
abe52ae
b04ff8c
3f392f4
0c295e2
b6897d1
7dddda3
81cf867
389c16f
561d2b5
d1a13f0
19234ee
2bba626
2e045d2
f17aaf2
af11bcc
5c2afd9
1958c3e
f1fdb04
c92d4fc
42dd9ef
5e0f3f5
b8883ff
425a74b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
## Integration Details | ||
|
||
This source extracts the following: | ||
|
||
- Connectors in fivetran as Data Pipelines and Data Jobs to represent data lineage information between source and destination. | ||
- Connector sources - DataJob input Datasets. | ||
- Connector destination - DataJob output Datasets. | ||
- Connector runs - DataProcessInstances as DataJob runs. | ||
|
||
## Configuration Notes | ||
|
||
1. Fivetran supports the fivetran platform connector to dump the log events and connectors, destinations, users and roles metadata in your destination. | ||
2. You need to setup and start the initial sync of the fivetran platform connector before using this source. Refer [link](https://fivetran.com/docs/logs/fivetran-platform/setup-guide). | ||
3. Once initial sync up of your fivetran platform connector is done, you need to provide the fivetran platform connector's destination platform and its configuration in the recipe. | ||
|
||
## Concept mapping | ||
|
||
| Fivetran | Datahub | | ||
|--------------------------|--------------------------------------------------------------------------------------------------------| | ||
| `Connector` | [DataJob](https://datahubproject.io/docs/generated/metamodel/entities/datajob/) | | ||
| `Source` | [Dataset](https://datahubproject.io/docs/generated/metamodel/entities/dataset/) | | ||
| `Destination` | [Dataset](https://datahubproject.io/docs/generated/metamodel/entities/dataset/) | | ||
| `Connector Run` | [DataProcessInstance](https://datahubproject.io/docs/generated/metamodel/entities/dataprocessinstance) | | ||
|
||
Source and destination are mapped to Dataset as an Input and Output of Connector. | ||
|
||
## Current limitations | ||
|
||
Works only for Snowflake destination for now. | ||
|
||
## Snowflake destination Configuration Guide | ||
1. If your fivetran platform connector destination is snowflake, you need to provide user details and its role with correct privileges in order to fetch metadata. | ||
2. Snowflake system admin can follow this guide to create a fivetran_datahub role, assign it the required privileges, and assign it to a user by executing the following Snowflake commands from a user with the ACCOUNTADMIN role or MANAGE GRANTS privilege. | ||
|
||
```sql | ||
create or replace role fivetran_datahub; | ||
|
||
// Grant access to a warehouse to run queries to view metadata | ||
grant operate, usage on warehouse "<your-warehouse>" to role fivetran_datahub; | ||
|
||
// Grant access to view database and schema in which your log and metadata tables exist | ||
grant usage on DATABASE "<fivetran-log-database>" to role fivetran_datahub; | ||
grant usage on SCHEMA "<fivetran-log-database>"."<fivetran-log-schema>" to role fivetran_datahub; | ||
|
||
// Grant access to execute select query on schema in which your log and metadata tables exist | ||
grant select on all tables in SCHEMA "<fivetran-log-database>"."<fivetran-log-schema>" to role fivetran_datahub; | ||
|
||
// Grant the fivetran_datahub to the snowflake user. | ||
grant role fivetran_datahub to user snowflake_user; | ||
``` | ||
|
||
## Advanced Configurations | ||
|
||
### Working with Platform Instances | ||
If you've multiple instances of source/destination systems that are referred in your `fivetran` setup, you'd need to configure platform instance for these systems in `fivetran` recipe to generate correct lineage edges. Refer the document [Working with Platform Instances](https://datahubproject.io/docs/platform-instances) to understand more about this. | ||
|
||
While configuration of platform instance for source system you need to provide connector id as key and for destination system provide destination id as key. | ||
|
||
#### Example - Multiple Postgres Source Connectors each reading from different postgres instance | ||
```yml | ||
# Map of connector source to platform instance | ||
sources_to_platform_instance: | ||
postgres_connector_id1: | ||
platform_instance: cloud_postgres_instance | ||
env: PROD | ||
|
||
postgres_connector_id2: | ||
platform_instance: local_postgres_instance | ||
env: DEV | ||
``` | ||
|
||
#### Example - Multiple Snowflake Destinations each writing to different snowflake instance | ||
```yml | ||
# Map of destination to platform instance | ||
destination_to_platform_instance: | ||
snowflake_destination_id1: | ||
platform_instance: prod_snowflake_instance | ||
env: PROD | ||
|
||
snowflake_destination_id2: | ||
platform_instance: dev_snowflake_instance | ||
env: PROD | ||
``` | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
source: | ||
type: fivetran | ||
config: | ||
# Fivetran log connector destination server configurations | ||
fivetran_log_config: | ||
destination_platform: snowflake | ||
destination_config: | ||
# Coordinates | ||
account_id: "abc48144" | ||
warehouse: "COMPUTE_WH" | ||
database: "MY_SNOWFLAKE_DB" | ||
log_schema: "FIVETRAN_LOG" | ||
|
||
# Credentials | ||
username: "${SNOWFLAKE_USER}" | ||
password: "${SNOWFLAKE_PASS}" | ||
role: "snowflake_role" | ||
|
||
# Optional - filter for certain connector names instead of ingesting everything. | ||
# connector_patterns: | ||
# allow: | ||
# - connector_name | ||
|
||
# Optional -- A mapping of the connector's all sources to its database. | ||
# sources_to_database: | ||
# connector_id: source_db | ||
|
||
# Optional -- This mapping is optional and only required to configure platform-instance for source | ||
# A mapping of Fivetran connector id to data platform instance | ||
# sources_to_platform_instance: | ||
# connector_id: | ||
# platform_instance: cloud_instance | ||
# env: DEV | ||
|
||
# Optional -- This mapping is optional and only required to configure platform-instance for destination. | ||
# A mapping of Fivetran destination id to data platform instance | ||
# destination_to_platform_instance: | ||
# destination_id: | ||
# platform_instance: cloud_instance | ||
# env: DEV | ||
|
||
sink: | ||
# sink configs |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ | |
from datahub.configuration.time_window_config import BaseTimeWindowConfig | ||
from datahub.emitter.mce_builder import make_dataplatform_instance_urn | ||
from datahub.emitter.mcp import MetadataChangeProposalWrapper | ||
from datahub.emitter.mcp_builder import entity_supports_aspect | ||
from datahub.ingestion.api.workunit import MetadataWorkUnit | ||
from datahub.metadata.schema_classes import ( | ||
BrowsePathEntryClass, | ||
|
@@ -64,9 +65,9 @@ def auto_status_aspect( | |
""" | ||
For all entities that don't have a status aspect, add one with removed set to false. | ||
""" | ||
|
||
all_urns: Set[str] = set() | ||
status_urns: Set[str] = set() | ||
skip_urns: Set[str] = set() | ||
for wu in stream: | ||
urn = wu.get_urn() | ||
all_urns.add(urn) | ||
|
@@ -89,9 +90,17 @@ def auto_status_aspect( | |
else: | ||
raise ValueError(f"Unexpected type {type(wu.metadata)}") | ||
|
||
if not isinstance( | ||
wu.metadata, MetadataChangeEventClass | ||
) and not entity_supports_aspect(wu.metadata.entityType, StatusClass): | ||
# If any entity does not support aspect 'status' then skip that entity from adding status aspect. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we have a map of exactly what entity types support what aspects - can we look this information up there instead? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can provide some pointers on how to do this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pls provide some pointers on this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use the helper method from here #9120 |
||
# Example like dataProcessInstance doesn't suppport status aspect. | ||
# If not skipped gives error: java.lang.RuntimeException: Unknown aspect status for entity dataProcessInstance | ||
skip_urns.add(urn) | ||
|
||
yield wu | ||
|
||
for urn in sorted(all_urns - status_urns): | ||
for urn in sorted(all_urns - status_urns - skip_urns): | ||
yield MetadataChangeProposalWrapper( | ||
entityUrn=urn, | ||
aspect=StatusClass(removed=False), | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explicitly call out that this only works with snowflake for now