Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle timeouts possible in Khepri minority in rabbit_db_vhost #11706

Merged
merged 6 commits into from
Jul 25, 2024

Conversation

the-mikedavis
Copy link
Member

@the-mikedavis the-mikedavis commented Jul 12, 2024

Like #11685 this handles {error, timeout} tuples for a database module, this time rabbit_db_vhost. rabbit_db_vhost mostly handles the errors well but needs a few small tweaks to specs and some updates for callers like the management UI.

The only problem function in rabbit_db_vhost is update/2 which is called by the rabbit_vhost_limit runtime parameter module in its notify/5 and notify_clear/4 callbacks. Those callbacks have no way of passing the error back to the user. Ideally the runtime parameter and vhost records would be updated at the same time transactionally but I think that would be a large refactor. Since you have to modify the runtime parameter in order for the notifications to trigger, it should be a very rare error that we time out just for the vhost update, so I think we can leave a large refactor to future work.

Marking this as a draft since it shares 9a71519 with #11685.

`create_or_get_in_khepri/2` throws errors like the
`rabbit_khepri:timeout_error()`. Callers of `create_or_get/3` like
`rabbit_vhost:do_add/3` and its callers handle the throw with a `try`/
`catch` block and return the error tuple, which is then handled by
their callers.
`rabbit_definitions:concurrent_for_all/4` doesn't pay any attention to
the return value of the `Fun`, only counting an error when it catches
`{error, E}`. So we need to `throw/1` the error from
`rabbit_vhost:put_vhost/6`.

The other callers of `rabbit_vhost:put_vhost/6` - the management UI and
the CLI (indirectly through `rabbit_vhost:add/2,3`) already handle this
error return.
This error is already handled by the callers of
`rabbit_vhost:update_metadata/3` (the CLI) and `rabbit_vhost:put_vhost/6`
(see the parent commit) but was just missing from the spec.
`set_tags/2` throws for database errors. This is benign since it's
caught by the CLI (the only caller) and turned into a Khepri-specific
error.
This function throws if the database fails to apply the transaction.
This function is only called by the `rabbit_vhost_limit` runtime
parameter module in its `notify/5` and `notify_clear/4` callbacks. These
callers have no way of handling this error but it should be very
difficult for them to face this crash: setting the runtime parameter
would need to succeed first which needs Khepri to be in majority. Khepri
would need to enter a minority between inserting/updating/deleting the
runtime parameter and updating the vhost. It's possible but unlikely.

In the future we could consider refactoring vhost limits to update the
vhost as the runtime parameter is changed, transactionally. I figure
that to be a very large change though so we leave this to the future.
We need to bubble up the error through the caller
`rabbit_vhost:delete/2`. The CLI calls `rabbit_vhost:delete/2` and
already handles the `{error, timeout}` but the management UI needs an
update so that an HTTP DELETE returns an error code when the deletion
times out.
@michaelklishin michaelklishin merged commit 29251a0 into main Jul 25, 2024
232 checks passed
@michaelklishin michaelklishin deleted the md/khepri-minority-errors/rabbit_db_vhost branch July 25, 2024 19:40
michaelklishin added a commit that referenced this pull request Jul 25, 2024
Handle timeouts possible in Khepri minority in `rabbit_db_vhost` (backport #11706)
@michaelklishin michaelklishin added this to the 4.0.0 milestone Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants