Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly report error when a plugin fails to run #720

Merged
merged 1 commit into from
May 21, 2019
Merged

Properly report error when a plugin fails to run #720

merged 1 commit into from
May 21, 2019

Conversation

johnSchnake
Copy link
Contributor

What this PR does / why we need it:
Currently the logic of the aggregator short circuits when a
plugin fails to Run without error. It exits abruptly and does
not launch other plugins or report the failure on the plugin
status.

This commit changes the following behavior:

  • ensures we can create all the auth certs needed before launching
    any plugins, ensuring we don't fail to generate some certs after
    other plugins are already running.
  • when a plugin fails to Run without error, we send a failing
    result so that the aggregator properly reports the status of
    the plugin as an error and the results tarball shows the error
    as expected.

Which issue(s) this PR fixes
Fixes #559

Special notes for your reviewer:
No good test for the entire logic here, I can attach some screenshots of when I modified the code to force an error though.

Release note:

The aggregator server will now report a plugin as failed if it can't properly be started and will continue starting other plugins.

@johnSchnake johnSchnake requested a review from stevesloka May 15, 2019 18:00
@@ -150,14 +152,22 @@ func Run(client kubernetes.Interface, plugins []plugin.Interface, cfg plugin.Agg
}()

// 4. Launch each plugin, to dispatch workers which submit the results back
certs := map[string]*tls.Certificate{}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a weird diff, but the point was that the loop over plugins did 2 things:

  • create cert
  • run plugin

If we can't create certs I think that is a completely different type of error condition and I'm OK with maintaining the existing behavior where the server does a hard return there. I pulled it into its own loop to more clearly separate the two operations so that we wouldn't have some plugins already started, fail to generate the next cert, then return from that function (without completing the function to ingest results, update status, etc)

@@ -64,7 +64,7 @@ func NewPlugin(dfn plugin.Definition, namespace, sonobuoyImage, imagePullPolicy,
// a Job only launches one pod, only one result type is expected.
func (p *Plugin) ExpectedResults(nodes []v1.Node) []plugin.ExpectedResult {
return []plugin.ExpectedResult{
plugin.ExpectedResult{ResultType: p.GetResultType()},
{ResultType: p.GetResultType()},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL that this is a gofmt simplification. I thought you had to put the type there.

err = errors.Wrapf(err, "error running plugin %v", p.GetName())
logrus.Error(err)
monitorCh <- utils.MakeErrorResult(p.GetResultType(), map[string]interface{}{"error": err.Error()}, "")
continue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 4 lines are the real change:

  • form the error with wrap
  • log it
  • send the error result (taken from the Monitor code)
  • continue with next plugin and the rest of aggregator operations.

@codecov-io
Copy link

Codecov Report

Merging #720 into master will decrease coverage by 0.19%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #720     +/-   ##
=========================================
- Coverage   39.36%   39.16%   -0.2%     
=========================================
  Files          68       68             
  Lines        3821     3827      +6     
=========================================
- Hits         1504     1499      -5     
- Misses       2220     2229      +9     
- Partials       97       99      +2
Impacted Files Coverage Δ
pkg/plugin/aggregation/run.go 0% <0%> (ø) ⬆️
pkg/plugin/driver/job/job.go 21.51% <0%> (ø) ⬆️
pkg/plugin/aggregation/aggregator.go 67.01% <0%> (-5.16%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9a19511...c2553ab. Read the comment docs.

@johnSchnake
Copy link
Contributor Author

Screen Shot 2019-05-15 at 1 39 43 PM

Screen Shot 2019-05-15 at 1 40 26 PM

Screen Shot 2019-05-15 at 1 41 44 PM

Currently the logic of the aggregator short circuits when a
plugin fails to `Run` without error. It exits abruptly and does
not launch other plugins or report the failure on the plugin
status.

This commit changes the following behavior:
 - ensures we can create all the auth certs needed before launching
any plugins, ensuring we don't fail to generate some certs after
other plugins are already running.
 - when a plugin fails to Run without error, we send a failing
result so that the aggregator properly reports the status of
the plugin as an error and the results tarball shows the error
as expected.

Fixes #559

Signed-off-by: John Schnake <[email protected]>
@johnSchnake johnSchnake merged commit e9aba66 into vmware-tanzu:master May 21, 2019
@johnSchnake johnSchnake deleted the failedPluginRun branch May 21, 2019 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failing to launch a plugin shouldn't exit the aggregator server
2 participants