Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an "all results" query to scanner/fixer workflows #5470

Merged
merged 6 commits into from
Dec 8, 2023

Conversation

Groxx
Copy link
Member

@Groxx Groxx commented Dec 6, 2023

What

This PR adds an all_results query to both scanner and fixer workflows, to retrieve all (non-empty) results in one operation. This makes it easier to find all failures and all output filenames, without having to repeatedly query in varying ways.

Why

Currently, getting all output filenames from these workflows is an exercise in frustration.

You can:

  • query shard_corrupt_keys to get all shards with corruptions (no data on fails, etc)
  • query shard_report to get a single shard's corruptions, errors, skips, control-flow failures
  • browse activity results by hand to discover ^ this in bulk

But unfortunately:

  • metrics do not contain per-shard info so finding the relevant activity or shard is hard
  • there are essentially no logs in this entire system (!?!)
  • there is currently no query to get both failures and corruptions/fixes in bulk
  • if one invariant reports "fixed" and then the next returns "fail" because the fix removed data,
    the end result goes into "failures". this is true for scans too, corrupt + fail == fail.

Many small bits of friction make trying to bulk-analyze this system incredibly painful.

While we do need to just rewrite the whole thing to be less... like it is... we can at least expose this bulk info quite easily in a new query.

Comment on lines +336 to +338
if v.Result.Empty() {
continue
}
Copy link
Member Author

@Groxx Groxx Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all shards have reports, so in the normal / healthy case this was producing a pile of

"1": {
  "ShardFixKeys": {
    "Fixed": null,
    "Skipped": null,
    "Failed": null,
  }
  "ControlFlowFailure": null,
}

which is a lot of unnecessary output when duplicated potentially thousands of times.

a possibly-nicer way to do this would be to use json tags and mark these as omitempty, but I'm not sure what things might be depending on the exact serialized structure at the moment, so I'm avoiding that.

@Groxx Groxx merged commit b635358 into cadence-workflow:master Dec 8, 2023
16 checks passed
@Groxx Groxx deleted the fixer-all-results branch December 8, 2023 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants