Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(ingest): move Airflow into datahub_provider module #2521

Merged
merged 11 commits into from
May 12, 2021

Conversation

hsheth2
Copy link
Collaborator

@hsheth2 hsheth2 commented May 10, 2021

Airflow users expect stuff to be in datahub_provider.{hooks,operators,lineage}.
Maintains backwards compatibility by re-exporting from the datahub.integrations.airflow files.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

@hsheth2 hsheth2 changed the title [WIP] refactor(ingest): move Airflow into datahub_provider package [WIP] refactor(ingest): move Airflow into datahub_provider module May 10, 2021
@sunkickr
Copy link
Contributor

How do you feel about renaming DataHubAirflowLineageBackend to DataHubLineageBackend this would fall more inline with how other airflow modules are named

@hsheth2
Copy link
Collaborator Author

hsheth2 commented May 11, 2021

@sunkickr makes sense - just made the change

Copy link
Contributor

@sunkickr sunkickr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like my comment on tests, example airflow dags can also be included in the datahub_provider like in the sample_repo here: https://github.com/astronomer/airflow-provider-sample/tree/main/sample_provider/example_dags. This way all airflow stuff can live in the provider and the registry will be able to build a page for each example dag in the example_dags directory like here: https://registry.astronomer-stage.io/providers/google/example-dags/example-automl-nl-text-classification

@@ -19,9 +19,9 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to store tests for operators, hooks, and the lineagebackend in datahub_provider like done in the sample repo: https://github.com/astronomer/airflow-provider-sample/tree/main/tests. This way modules from datahub_provider won't need to be imported into datahub.

Copy link
Collaborator Author

@hsheth2 hsheth2 May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'm following the suggestion - the tests in both the sample repo and in this one are stored separate from the main source directories.

The datahub imports of datahub_provider are primarily for backwards compatibility, so I'm pretty ok with leaving those in the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense was just suggesting because module tests are usually packaged with modules.

@hsheth2
Copy link
Collaborator Author

hsheth2 commented May 11, 2021

@sunkickr - I'd like to keep the examples/airflow directory outside of src, since it will be better for discoverability and everything under the src directory gets bundled and packaged with the pip package, which I don't want for the examples.

I have gone through and updated all the imports to use datahub_provider.

@sunkickr
Copy link
Contributor

@hsheth2 - Maybe we could just duplicate the example dags repo into the provider package so they are easily discoverable in both the datahub repo and the registry. Example dags located in the datahub_provider/example_dags/ directory will be searchable in the registry. Also we are planning lots of features for example dags so provider packages that have them will get more exposure by having more searchable/interactive content. Right now if a user is looking at a datahub module in the registry, example dags that use that module will appear with it if they are included in the package.

@hsheth2 hsheth2 marked this pull request as ready for review May 12, 2021 20:51
@hsheth2 hsheth2 changed the title [WIP] refactor(ingest): move Airflow into datahub_provider module refactor(ingest): move Airflow into datahub_provider module May 12, 2021
@sunkickr
Copy link
Contributor

sunkickr commented May 12, 2021

Everything looks good to me structurally

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shirshanka shirshanka merged commit a671001 into datahub-project:master May 12, 2021
@hsheth2 hsheth2 deleted the astro branch November 10, 2022 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants