Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] #4198 : Introduce PhaseInfoFailureWithCleanup for Handling Non-Recoverable Pod States #4297

Closed
wants to merge 2 commits into from

Conversation

adarsh-jha-dev
Copy link
Contributor

Tracking issue

Closes #4198

Describe your changes

  • Added a new method PhaseInfoFailureWithCleanup in the PodSetup package to handle non-recoverable pod states.
  • Modified the existing DemystifyFailure function to utilize the new PhaseInfoFailureWithCleanup method when non-recoverable pod states are encountered. This ensures that such pods are marked as failed and cleaned up properly.

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Screenshots

N/A

Note to reviewers

Please review this PR and provide feedback on whether it effectively addresses the issue mentioned in the context.

Signed-off-by: Adarsh Jha <[email protected]>
Copy link
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, just a reminder, if the function is added, the test should be added too, thanks a lot!

Copy link
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@Future-Outlier
Copy link
Member

Can you also provide a screenshot or some test for this case?
I mean you can mock the scenario and ensure the pod is cleaned up correctly

Signed-off-by: Adarsh Jha <[email protected]>
@adarsh-jha-dev
Copy link
Contributor Author

Hi, just a reminder, if the function is added, the test should be added too, thanks a lot!

Thanks for the reminder, and sorry for missing that , I have added the required test.

@adarsh-jha-dev
Copy link
Contributor Author

Hey @Future-Outlier , i have implemented the changes, could you please review it and let me know if the changes are relevant?

@Future-Outlier
Copy link
Member

@adarsh-jha-dev I will try to spend time on this issue in 2 days, thank you very much.
Love your attitude.

@adarsh-jha-dev
Copy link
Contributor Author

@adarsh-jha-dev I will try to spend time on this issue in 2 days, thank you very much. Love your attitude.

Thanks a lot

@hamersaw
Copy link
Contributor

hamersaw commented Oct 30, 2023

I think this is a little more than the issue needs. This PR introduced a cleanupOnFailure flag to the PhaseInfo and then a new function for PhaseInfoRetryableFailureWithCleanup to enable it. The result is that these failures will mark the Flyte task as failed, but also delete the Pod in question.

The goal here was to add a similar PhaseInfoFailureWithCleanup that can be used here. For example, if there is a Pod created with an invalid name, the Flyte task should be a failure, but Flyte should also cleanup the Pod by deleting it..

@adarsh-jha-dev
Copy link
Contributor Author

I think this is a little more than the issue needs. This PR introduced a cleanupOnFailure flag to the PhaseInfo and then a new function for PhaseInfoRetryableFailureWithCleanup to enable it. The result is that these failures will mark the Flyte task as failed, but also delete the Pod in question.

The goal here was to add a similar PhaseInfoFailureWithCleanup that can be used here. For example, if there is a Pod created with an invalid name, the Flyte task should be a failure, but Flyte should also cleanup the Pod by deleting it..

Then should i do any changes or it will be fine with this?

@hamersaw
Copy link
Contributor

Then should i do any changes or it will be fine with this?

If you would be willing to update this, it would be very much appreciated. I think a PhaseFailureWithCleanup function very similar to this one and then replacing this and testing that Pods that are created with invalid names are cleaned up would be awesome!

@adarsh-jha-dev
Copy link
Contributor Author

Then should i do any changes or it will be fine with this?

If you would be willing to update this, it would be very much appreciated. I think a PhaseFailureWithCleanup function very similar to this one and then replacing this and testing that Pods that are created with invalid names are cleaned up would be awesome!

Hey @hamersaw , I tried but was unable to implement this , could you please let me know that if the current changes are equivalent to the desired ones or not? Like will these also work?

@hamersaw
Copy link
Contributor

hamersaw commented Nov 2, 2023

Then should i do any changes or it will be fine with this?

If you would be willing to update this, it would be very much appreciated. I think a PhaseFailureWithCleanup function very similar to this one and then replacing this and testing that Pods that are created with invalid names are cleaned up would be awesome!

Hey @hamersaw , I tried but was unable to implement this , could you please let me know that if the current changes are equivalent to the desired ones or not? Like will these also work?

@adarsh-jha-dev unfortunately I think this PR will introduce unnecessary complexity. Also, without setting the cleanupOnFailure flag it is not clear to me that this will actually clean up resources.

I tried but was unable to implement this

Can you elaborate on the difficulties? I would be happy to help!

@adarsh-jha-dev adarsh-jha-dev closed this by deleting the head repository Jan 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Core feature] Introduce PhaseInfoFailureWithCleanup for non-recoverable pod states
3 participants