-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(airflow): fix AthenaOperator extraction #11857
Conversation
a01869b
to
b9e369e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks reasonable - can we add a test for this?
@hsheth2 Thank for looking at this! I don't see related tests in this module. Could you point me at an example? I don't have much experience with this code base. |
@steffengr we have an integration test (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion-modules/airflow-plugin/tests/integration/test_plugin.py) that runs a number of DAGs (e.g. https://github.com/datahub-project/datahub/blob/master/metadata-ingestion-modules/airflow-plugin/tests/integration/dags/snowflake_operator.py) That'd probably be the easiest way to do it. Otherwise, we could also do a more targeted unit test in https://github.com/datahub-project/datahub/blob/master/metadata-ingestion-modules/airflow-plugin/tests/unit/test_airflow.py |
b9e369e
to
3052120
Compare
@hsheth2 I added a test. I had to set |
3052120
to
3a2cd8d
Compare
I'm not sure what causes these test failures. I'm also getting them on master. |
3a2cd8d
to
fb5c382
Compare
Looks like CI is still failing. I see this in the logs (e.g. here https://github.com/datahub-project/datahub/actions/runs/12052943048/job/33607437487?pr=11857)
I also created #11981 to help make this a bit easier to debug, in case that's useful. |
fb5c382
to
32ded0d
Compare
99ee2c7
to
cd6ede2
Compare
@hsheth2 I'm lost here. I suspected there might be a race condition between triggering the dag and loading it as I get this error randomly on any of the test dags on my local machine. However, calling the rest API to wait for the dag to be available doesn't seem to help on some of the Airflow versions. Any idea what could cause the dag to not load on some of the Airflow versions? |
cd6ede2
to
a098d60
Compare
ba9d958
to
d16ec0b
Compare
@hsheth2 Looks like changing my REST call to check if the dag exists by removing the query parameter that didn't exist in old versions of Airflow did the trick but now I'm getting another strange error regarding the golden file not existing on only one particular version of Airflow (2.4.3). Does this version require anything special?
|
@steffengr with airflow 2.4, we support a limited form of the plugin, since certain features weren't available until later versions. We run tests for that as well, which generates the "no_dag_listener" variants of the golden files. That file golden can only be generated on airflow 2.4. I believe something like this would generate that missing file.
|
The GenericSqlExtractor which is currently by the DataHub Airflow plugin to extract lineage information does not properly support the AthenaOperator and crashes with "AttributeError: 'AthenaOperator' object has no attribute 'sql'". This patch introduces a AthenaOperatorExtractor following the BigQueryInsertJobOperatorExtractor example to fix support for the AthenaOperator.
d16ec0b
to
252e1c5
Compare
@hsheth2 Thanks! That did the trick. Is there any chance to get this into a 0.14 patch release? |
@steffengr this will likely go into 0.15.0. We will definitely at least cut an rc this week, and might also cut the full release |
The GenericSqlExtractor which is currently by the DataHub Airflow plugin to extract lineage information does not properly support the AthenaOperator and crashes with "AttributeError: 'AthenaOperator' object has no attribute 'sql'". This patch introduces a AthenaOperatorExtractor following the BigQueryInsertJobOperatorExtractor example to fix support for the AthenaOperator.
Fixes #11160
Checklist