-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle timeouts possible in Khepri minority in rabbit_db_vhost
#11706
Merged
michaelklishin
merged 6 commits into
main
from
md/khepri-minority-errors/rabbit_db_vhost
Jul 25, 2024
Merged
Handle timeouts possible in Khepri minority in rabbit_db_vhost
#11706
michaelklishin
merged 6 commits into
main
from
md/khepri-minority-errors/rabbit_db_vhost
Jul 25, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
`create_or_get_in_khepri/2` throws errors like the `rabbit_khepri:timeout_error()`. Callers of `create_or_get/3` like `rabbit_vhost:do_add/3` and its callers handle the throw with a `try`/ `catch` block and return the error tuple, which is then handled by their callers.
`rabbit_definitions:concurrent_for_all/4` doesn't pay any attention to the return value of the `Fun`, only counting an error when it catches `{error, E}`. So we need to `throw/1` the error from `rabbit_vhost:put_vhost/6`. The other callers of `rabbit_vhost:put_vhost/6` - the management UI and the CLI (indirectly through `rabbit_vhost:add/2,3`) already handle this error return.
This error is already handled by the callers of `rabbit_vhost:update_metadata/3` (the CLI) and `rabbit_vhost:put_vhost/6` (see the parent commit) but was just missing from the spec.
`set_tags/2` throws for database errors. This is benign since it's caught by the CLI (the only caller) and turned into a Khepri-specific error.
This function throws if the database fails to apply the transaction. This function is only called by the `rabbit_vhost_limit` runtime parameter module in its `notify/5` and `notify_clear/4` callbacks. These callers have no way of handling this error but it should be very difficult for them to face this crash: setting the runtime parameter would need to succeed first which needs Khepri to be in majority. Khepri would need to enter a minority between inserting/updating/deleting the runtime parameter and updating the vhost. It's possible but unlikely. In the future we could consider refactoring vhost limits to update the vhost as the runtime parameter is changed, transactionally. I figure that to be a very large change though so we leave this to the future.
We need to bubble up the error through the caller `rabbit_vhost:delete/2`. The CLI calls `rabbit_vhost:delete/2` and already handles the `{error, timeout}` but the management UI needs an update so that an HTTP DELETE returns an error code when the deletion times out.
the-mikedavis
force-pushed
the
md/khepri-minority-errors/rabbit_db_vhost
branch
from
July 22, 2024 19:56
8090de3
to
8399450
Compare
michaelklishin
approved these changes
Jul 25, 2024
michaelklishin
added a commit
that referenced
this pull request
Jul 25, 2024
Handle timeouts possible in Khepri minority in `rabbit_db_vhost` (backport #11706)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Like #11685 this handles
{error, timeout}
tuples for a database module, this timerabbit_db_vhost
.rabbit_db_vhost
mostly handles the errors well but needs a few small tweaks to specs and some updates for callers like the management UI.The only problem function in
rabbit_db_vhost
isupdate/2
which is called by therabbit_vhost_limit
runtime parameter module in itsnotify/5
andnotify_clear/4
callbacks. Those callbacks have no way of passing the error back to the user. Ideally the runtime parameter and vhost records would be updated at the same time transactionally but I think that would be a large refactor. Since you have to modify the runtime parameter in order for the notifications to trigger, it should be a very rare error that we time out just for the vhost update, so I think we can leave a large refactor to future work.Marking this as a draft since it shares 9a71519 with #11685.