-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error handling in ShellTask #1732
Conversation
Thank you for opening this pull request! 🙌 These tips will help get your PR across the finish line:
|
Signed-off-by: Pradithya Aria <[email protected]>
Signed-off-by: Pradithya Aria <[email protected]>
6b97035
to
d71a6bf
Compare
Signed-off-by: Pradithya Aria <[email protected]>
raise | ||
logger.error(error) | ||
# raise FlyteRecoverableException so that it's classified as user error and will be retried | ||
raise FlyteRecoverableException(error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not every user error should be a recoverable one 🤔
Take a look at how PythonFunctionTask
separates user vs system error scopes:
return exception_scopes.user_entry_point(self._task_function)(**kwargs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not every user error should be a recoverable one
I agree, however, since it's a shell execution I am not sure how can we differentiate recoverable vs non-recoverable user error. I am defaulting to recoverable error so that the users of ShellTask can still leverage the task retry feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, of course, makes sense 👍
This change is not really backwards compatible though, I wonder whether this is a dealbreaker 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is undoubtedly a different behavior change when a shell task throws an error.
Before: Error in shell task is considered system error thus retried N number of times, where N is 3 by default.
After: Error in shell task is considered user recoverable error. By default, it's not retried and will only be retried if users specify retry in the task's metadata.
However, the shell task didn't respect the retry strategy set by users, and I think it can be considered as bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wild-endeavor your opinion will be very helpful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I think shell task should throw user recoverable error
""" | ||
process = subprocess.Popen(script, stdout=subprocess.PIPE, stderr=subprocess.PIPE, bufsize=0, shell=True, text=True) | ||
|
||
# print stdout so that long-running subprocess will not appear unresponsive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: Currently, the pod's log didn't show any stdout and stderr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it does. However this PR capture the stdout thus need to emit it manually.
Congrats on merging your first pull request! 🎉 |
* Improve error handling in ShellTask Signed-off-by: Pradithya Aria <[email protected]> * Add new line Signed-off-by: Pradithya Aria <[email protected]> * Capture stdout Signed-off-by: Pradithya Aria <[email protected]> --------- Signed-off-by: Pradithya Aria <[email protected]> Co-authored-by: Pradithya Aria <[email protected]>
* Improve error handling in ShellTask * Add new line * Capture stdout --------- Signed-off-by: Pradithya Aria <[email protected]> Co-authored-by: Pradithya Aria <[email protected]>
* Improve error handling in ShellTask Signed-off-by: Pradithya Aria <[email protected]> * Add new line Signed-off-by: Pradithya Aria <[email protected]> * Capture stdout Signed-off-by: Pradithya Aria <[email protected]> --------- Signed-off-by: Pradithya Aria <[email protected]> Co-authored-by: Pradithya Aria <[email protected]>
TL;DR
Improve the error returned by
ShellTask
. As of now,ShellTask
error message is not helpful since it doesn't capture thestderr
from the subprocess. Additionally, the error thrown byShellTask
is currently classified as system error instead of user error.Type
Are all requirements met?
Complete description
FlyteRecoverableError
fromShellTask
so that it's classified as user's error and the retry policy specified in the task is respected.ShellTask
. As of now the error message generated byShellTask
in flyteconsole looks as follows. Notice that it doesn't contain any details of the failed command.With this change, users would be able to see
stdout
andstderr
of the subprocess as part of the error message.Tracking Issue
flyteorg/flyte#3559
Follow-up issue
N.A.