Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure teardown tasks are executed when DAG run is set to failed #45530

Conversation

jscheffl
Copy link
Contributor

@jscheffl jscheffl commented Jan 9, 2025

Related to Slack topic: https://apache-airflow.slack.com/archives/CCR6P6JRL/p1736440079894049

We noticed that if a DAG run is set to failed, all tasks are either set to failed or skipped. But if Teardown Tasks are used in a DAG, they are not executed. This could lead to infrastructure or external dependencies not properly cleaned-up.

This PR changes the behavior and does NOT fail/skip teardown tasks if a DAG is set to failed.
A side effect as consequence might be that the DAG is after the call NOT failed, else if it would set it to failed, then any teardown task (even if not skipped/failed) will not scheduled anymore.

@jscheffl jscheffl added the AIP-52 Automatic setup and teardown tasks label Jan 9, 2025
@jscheffl jscheffl added this to the Airflow 2.10.5 milestone Jan 9, 2025
@boring-cyborg boring-cyborg bot added the area:API Airflow's REST/HTTP API label Jan 9, 2025
@jscheffl jscheffl added the backport-to-v2-10-test Mark PR with this label to backport to v2-10-test branch label Jan 9, 2025
@jscheffl jscheffl requested a review from dstandish January 9, 2025 22:28
@jscheffl jscheffl force-pushed the bugfix/ensure-teardown-is-executed-when-dag-set-failed branch from b9d8425 to 196b89c Compare January 9, 2025 23:08
@jscheffl jscheffl marked this pull request as ready for review January 9, 2025 23:11
Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!
One edge case that I thought about though* - what if the corresponding setup task hasn't finished running yet? (if such exists ofc)
For example:
image
If you set the DAG run to failed before the cluster was created, the delete_cluster should be skipped.

* Maybe the current architecture already handles, but it's worth checking before merging this PR.

@jscheffl jscheffl force-pushed the bugfix/ensure-teardown-is-executed-when-dag-set-failed branch from 196b89c to 031d7af Compare January 11, 2025 17:07
@jscheffl jscheffl requested a review from potiuk as a code owner January 11, 2025 17:07
@jscheffl
Copy link
Contributor Author

Nice catch! One edge case that I thought about though* - what if the corresponding setup task hasn't finished running yet? (if such exists ofc) For example: image If you set the DAG run to failed before the cluster was created, the delete_cluster should be skipped.

  • Maybe the current architecture already handles, but it's worth checking before merging this PR.

Oh, yeah. Tested this explicitly. And actually the Scheduler had this feature before (added this to docs as well, was also new to me): If the setup task is skipped (which is the state when you mark as failed and it was not running) then the teardown is also skipped,

@jscheffl
Copy link
Contributor Author

Added some docs as I thought about that slightly the behavior changes - to ensure it is properly documented.

@jscheffl jscheffl added the type:bug-fix Changelog: Bug Fixes label Jan 11, 2025
@jscheffl jscheffl merged commit 1e8977a into apache:main Jan 11, 2025
62 checks passed
github-actions bot pushed a commit that referenced this pull request Jan 11, 2025
…o failed (#45530)

* Ensure teardown tasks are executed when DAG run is set to failed

* Also handle the case of setting DAG to success

* Add some documentation to behavior changes

* Add some documentation to behavior changes
(cherry picked from commit 1e8977a)

Co-authored-by: Jens Scheffler <[email protected]>
Copy link

Backport successfully created: v2-10-test

Status Branch Result
v2-10-test PR Link

github-actions bot pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Jan 11, 2025
…o failed (apache#45530)

* Ensure teardown tasks are executed when DAG run is set to failed

* Also handle the case of setting DAG to success

* Add some documentation to behavior changes

* Add some documentation to behavior changes
(cherry picked from commit 1e8977a)

Co-authored-by: Jens Scheffler <[email protected]>
jscheffl added a commit that referenced this pull request Jan 11, 2025
…o failed (#45530) (#45581)

* [v2-10-test] Ensure teardown tasks are executed when DAG run is set to failed (#45530)

* Ensure teardown tasks are executed when DAG run is set to failed

* Also handle the case of setting DAG to success

* Add some documentation to behavior changes

* Add some documentation to behavior changes
(cherry picked from commit 1e8977a)

Co-authored-by: Jens Scheffler <[email protected]>

* Remove type hints only working in Airflow 3

---------

Co-authored-by: Jens Scheffler <[email protected]>
Co-authored-by: Jens Scheffler <[email protected]>
agupta01 pushed a commit to agupta01/airflow that referenced this pull request Jan 13, 2025
…che#45530)

* Ensure teardown tasks are executed when DAG run is set to failed

* Also handle the case of setting DAG to success

* Add some documentation to behavior changes

* Add some documentation to behavior changes
karenbraganz pushed a commit to karenbraganz/airflow that referenced this pull request Jan 13, 2025
…che#45530)

* Ensure teardown tasks are executed when DAG run is set to failed

* Also handle the case of setting DAG to success

* Add some documentation to behavior changes

* Add some documentation to behavior changes
@Lee-W
Copy link
Member

Lee-W commented Jan 13, 2025

just notice this could potentially be a breaking change, but doesn't look like something we need a migration rule 🤔 would like to confirm whether we might want to have such a rule?

@potiuk
Copy link
Member

potiuk commented Jan 14, 2025

just notice this could potentially be a breaking change, but doesn't look like something we need a migration rule 🤔 would like to confirm whether we might want to have such a rule?

I'd say it's a bugfix. Not all "change behaviour" is "breaking change". Every bugfix is a "behavioural change" in essence, the important thing is what is the feature intention and whether the change is "fixing" things or "changing intention on how things should work". SemVer is all about intentions and not about whether behaviour changes or not - see https://semver.org/

In this case it seems reasonable to assume that this behaviour was intention of the "teardown" behaviour - so technically it's a bugfix (and it's been already backported to 2.10 it seems.

It might require a newsfragment, but IMHO - that's more than enough.

@jscheffl
Copy link
Contributor Author

just notice this could potentially be a breaking change, but doesn't look like something we need a migration rule 🤔 would like to confirm whether we might want to have such a rule?

For sure it does not need migration rules. No code change needed.

Yeah and thought also a bit about if it is breaking, asked in #random / Slack for feedback and nobody except @dstandish was responding. Then after passing... I think it is really in the intend for why we have Setup/Teardown. Behaviour change is "just" because of the side effect that the DAG is not immediately failed if you set it to failed as Teardown need to be scheduled.

HariGS-DB pushed a commit to HariGS-DB/airflow that referenced this pull request Jan 16, 2025
…che#45530)

* Ensure teardown tasks are executed when DAG run is set to failed

* Also handle the case of setting DAG to success

* Add some documentation to behavior changes

* Add some documentation to behavior changes
dauinh pushed a commit to dauinh/airflow that referenced this pull request Jan 24, 2025
…che#45530)

* Ensure teardown tasks are executed when DAG run is set to failed

* Also handle the case of setting DAG to success

* Add some documentation to behavior changes

* Add some documentation to behavior changes
got686-yandex pushed a commit to got686-yandex/airflow that referenced this pull request Jan 30, 2025
…che#45530)

* Ensure teardown tasks are executed when DAG run is set to failed

* Also handle the case of setting DAG to success

* Add some documentation to behavior changes

* Add some documentation to behavior changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-52 Automatic setup and teardown tasks area:API Airflow's REST/HTTP API backport-to-v2-10-test Mark PR with this label to backport to v2-10-test branch type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants