-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly report error when a plugin fails to run #720
Conversation
@@ -150,14 +152,22 @@ func Run(client kubernetes.Interface, plugins []plugin.Interface, cfg plugin.Agg | |||
}() | |||
|
|||
// 4. Launch each plugin, to dispatch workers which submit the results back | |||
certs := map[string]*tls.Certificate{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a weird diff, but the point was that the loop over plugins did 2 things:
- create cert
- run plugin
If we can't create certs I think that is a completely different type of error condition and I'm OK with maintaining the existing behavior where the server does a hard return there. I pulled it into its own loop to more clearly separate the two operations so that we wouldn't have some plugins already started, fail to generate the next cert, then return from that function (without completing the function to ingest results, update status, etc)
@@ -64,7 +64,7 @@ func NewPlugin(dfn plugin.Definition, namespace, sonobuoyImage, imagePullPolicy, | |||
// a Job only launches one pod, only one result type is expected. | |||
func (p *Plugin) ExpectedResults(nodes []v1.Node) []plugin.ExpectedResult { | |||
return []plugin.ExpectedResult{ | |||
plugin.ExpectedResult{ResultType: p.GetResultType()}, | |||
{ResultType: p.GetResultType()}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL that this is a gofmt simplification. I thought you had to put the type there.
err = errors.Wrapf(err, "error running plugin %v", p.GetName()) | ||
logrus.Error(err) | ||
monitorCh <- utils.MakeErrorResult(p.GetResultType(), map[string]interface{}{"error": err.Error()}, "") | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 4 lines are the real change:
- form the error with wrap
- log it
- send the error result (taken from the
Monitor
code) - continue with next plugin and the rest of aggregator operations.
Codecov Report
@@ Coverage Diff @@
## master #720 +/- ##
=========================================
- Coverage 39.36% 39.16% -0.2%
=========================================
Files 68 68
Lines 3821 3827 +6
=========================================
- Hits 1504 1499 -5
- Misses 2220 2229 +9
- Partials 97 99 +2
Continue to review full report at Codecov.
|
Currently the logic of the aggregator short circuits when a plugin fails to `Run` without error. It exits abruptly and does not launch other plugins or report the failure on the plugin status. This commit changes the following behavior: - ensures we can create all the auth certs needed before launching any plugins, ensuring we don't fail to generate some certs after other plugins are already running. - when a plugin fails to Run without error, we send a failing result so that the aggregator properly reports the status of the plugin as an error and the results tarball shows the error as expected. Fixes #559 Signed-off-by: John Schnake <[email protected]>
What this PR does / why we need it:
Currently the logic of the aggregator short circuits when a
plugin fails to
Run
without error. It exits abruptly and doesnot launch other plugins or report the failure on the plugin
status.
This commit changes the following behavior:
any plugins, ensuring we don't fail to generate some certs after
other plugins are already running.
result so that the aggregator properly reports the status of
the plugin as an error and the results tarball shows the error
as expected.
Which issue(s) this PR fixes
Fixes #559
Special notes for your reviewer:
No good test for the entire logic here, I can attach some screenshots of when I modified the code to force an error though.
Release note: