New health checks for metadata store initialization #13167
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This introduces two new health checks
rabbitmq-diagnostics check_if_metadata_store_is_initialized
rabbitmq-diagnostics check_if_metadata_store_is_initialized_with_data
and their HTTP API counterparts:
GET {prefix}/health/checks/metadata-store/initialized
GET {prefix}/health/checks/metadata-store/initialized/with-data
The first one relies on
rabbit_db:is_init_completed/0
, which is already used by a fewcode paths, namely peer discovery, to detect when the metadata store initialization has
completed.
The second check is more opinionated: it assumes that a cluster will always have
at least one virtual host. Technically you can delete the only virtual host in the system
but then the cluster would not be practically useful.
So let's use this clue, at least one virtual host that the metadata store contains,
as a good enough indication that the metadata store has synced "just enough" for client
connections to have a chance of succeeding.
Note that these checks cannot be 100% accurate in the case of Mnesia because some data may still be in flight but should be pretty accurate in the case of Khepri, which is
the future of RabbitMQ in any case.
In any case, we currently do not provide a health check like this, which makes the problem
outlined in #13153 harder to spot and comprehend.
References #13153.