-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Optimise HTTP validator lookups #3559
Conversation
Here's some performance info from some of our Goerli nodes with thousands of inactive validator keys: This is the time required for the VC to update all its validator indices from the BN, which drops from 110s to 83s on one VC (-25%) and from 29s to 14s on the other (-51%). We can see that the underlying time that the BN is spending on the request is even more greatly reduced. Before running this PR the times are in the 2.5-6ms range, and afterwards are in the 50-250us range. Before:
After:
That's a 10-50x reduction (at least -90%). |
@deauna, this review looks a lot like engagement farming to me and I will ban your account from this repository if it persists. I apologise sincerely if I've read this wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one, I've also been giving this issue the side-eye when poking around Grafana.
I have one comment, but it's not a blocker.
snapshot | ||
.beacon_state | ||
.build_all_committee_caches(&self.spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be a little heavy handed to build the exit cache here as well, but I don't feel too strongly about that. IIRC we don't always need the exit cache to process a block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figure it's quasi-equivalent to building it on-demand when it is eventually required. If we process a block and it becomes head, chances are its descendants will remain canonical and the exit cache will be required eventually, at which point we've paid once to build it (when the first ancestor was processed). The only case where I think it'll hurt is if there are lots of re-orgs that miss the snapshot cache and load a fresh state without any caches. We hope that these are rare. These re-orgs and these sorts of caching issues are also mitigated by tree-states
, where all the states share common CoW caches rooted at the finalized state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My (potentially shoddy) understanding is that the exit cache is only required when we're processing exits, which is relatively infrequently. However, given that:
- The cache we're building is on the head state.
- The head state usually maintains it's caches (i.e., they aren't dropped and rebuilt often)
- Trying to build an already-built exit cache is cheap.
I'm happy with this. I'll slap another approval!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved! I understand this PR went back to "ready-for-review" since you were waiting for my response to a comment.
Sweet! bors r+ |
## Issue Addressed While digging around in some logs I noticed that queries for validators by pubkey were taking 10ms+, which seemed too long. This was due to a loop through the entire validator registry for each lookup. ## Proposed Changes Rather than using a loop through the register, this PR utilises the pubkey cache which is usually initialised at the head*. In case the cache isn't built, we fall back to the previous loop logic. In the vast majority of cases I expect the cache will be built, as the validator client queries at the `head` where all caches should be built. ## Additional Info *I had to modify the cache build that runs after fork choice to build the pubkey cache. I think it had been optimised out, perhaps accidentally. I think it's preferable to have the exit cache and the pubkey cache built on the head state, as they are required for verifying deposits and exits respectively, and we may as well build them off the hot path of block processing. Previously they'd get built the first time a deposit or exit needed to be verified. I've deleted the unused `map_state` function which was obsoleted by `map_state_and_execution_optimistic`.
Pull request successfully merged into unstable. Build succeeded:
|
## Issue Addressed This reverts commit ca9dc8e (PR #3559) with some modifications. ## Proposed Changes Unfortunately that PR introduced a performance regression in fork choice. The optimisation _intended_ to build the exit and pubkey caches on the head state _only if_ they were not already built. However, due to the head state always being cloned without these caches, we ended up building them every time the head changed, leading to a ~70ms+ penalty on mainnet. https://github.com/sigp/lighthouse/blob/fcfd02aeec435203269b03865e3ccc23e5f51e6d/beacon_node/beacon_chain/src/canonical_head.rs#L633-L636 I believe this is a severe enough regression to justify immediately releasing v3.2.1 with this change. ## Additional Info I didn't fully revert #3559, because there were some unrelated deletions of dead code in that PR which I figured we may as well keep. An alternative would be to clone the extra caches, but this likely still imposes some cost, so in the interest of applying a conservative fix quickly, I think reversion is the best approach. The optimisation from #3559 was not even optimising a particularly significant path, it was mostly for VCs running larger numbers of inactive keys. We can re-do it in the `tree-states` world where cache clones are cheap.
## Issue Addressed While digging around in some logs I noticed that queries for validators by pubkey were taking 10ms+, which seemed too long. This was due to a loop through the entire validator registry for each lookup. ## Proposed Changes Rather than using a loop through the register, this PR utilises the pubkey cache which is usually initialised at the head*. In case the cache isn't built, we fall back to the previous loop logic. In the vast majority of cases I expect the cache will be built, as the validator client queries at the `head` where all caches should be built. ## Additional Info *I had to modify the cache build that runs after fork choice to build the pubkey cache. I think it had been optimised out, perhaps accidentally. I think it's preferable to have the exit cache and the pubkey cache built on the head state, as they are required for verifying deposits and exits respectively, and we may as well build them off the hot path of block processing. Previously they'd get built the first time a deposit or exit needed to be verified. I've deleted the unused `map_state` function which was obsoleted by `map_state_and_execution_optimistic`.
## Issue Addressed This reverts commit ca9dc8e (PR sigp#3559) with some modifications. ## Proposed Changes Unfortunately that PR introduced a performance regression in fork choice. The optimisation _intended_ to build the exit and pubkey caches on the head state _only if_ they were not already built. However, due to the head state always being cloned without these caches, we ended up building them every time the head changed, leading to a ~70ms+ penalty on mainnet. https://github.com/sigp/lighthouse/blob/fcfd02aeec435203269b03865e3ccc23e5f51e6d/beacon_node/beacon_chain/src/canonical_head.rs#L633-L636 I believe this is a severe enough regression to justify immediately releasing v3.2.1 with this change. ## Additional Info I didn't fully revert sigp#3559, because there were some unrelated deletions of dead code in that PR which I figured we may as well keep. An alternative would be to clone the extra caches, but this likely still imposes some cost, so in the interest of applying a conservative fix quickly, I think reversion is the best approach. The optimisation from sigp#3559 was not even optimising a particularly significant path, it was mostly for VCs running larger numbers of inactive keys. We can re-do it in the `tree-states` world where cache clones are cheap.
## Issue Addressed While digging around in some logs I noticed that queries for validators by pubkey were taking 10ms+, which seemed too long. This was due to a loop through the entire validator registry for each lookup. ## Proposed Changes Rather than using a loop through the register, this PR utilises the pubkey cache which is usually initialised at the head*. In case the cache isn't built, we fall back to the previous loop logic. In the vast majority of cases I expect the cache will be built, as the validator client queries at the `head` where all caches should be built. ## Additional Info *I had to modify the cache build that runs after fork choice to build the pubkey cache. I think it had been optimised out, perhaps accidentally. I think it's preferable to have the exit cache and the pubkey cache built on the head state, as they are required for verifying deposits and exits respectively, and we may as well build them off the hot path of block processing. Previously they'd get built the first time a deposit or exit needed to be verified. I've deleted the unused `map_state` function which was obsoleted by `map_state_and_execution_optimistic`.
## Issue Addressed This reverts commit ca9dc8e (PR sigp#3559) with some modifications. ## Proposed Changes Unfortunately that PR introduced a performance regression in fork choice. The optimisation _intended_ to build the exit and pubkey caches on the head state _only if_ they were not already built. However, due to the head state always being cloned without these caches, we ended up building them every time the head changed, leading to a ~70ms+ penalty on mainnet. https://github.com/sigp/lighthouse/blob/fcfd02aeec435203269b03865e3ccc23e5f51e6d/beacon_node/beacon_chain/src/canonical_head.rs#L633-L636 I believe this is a severe enough regression to justify immediately releasing v3.2.1 with this change. ## Additional Info I didn't fully revert sigp#3559, because there were some unrelated deletions of dead code in that PR which I figured we may as well keep. An alternative would be to clone the extra caches, but this likely still imposes some cost, so in the interest of applying a conservative fix quickly, I think reversion is the best approach. The optimisation from sigp#3559 was not even optimising a particularly significant path, it was mostly for VCs running larger numbers of inactive keys. We can re-do it in the `tree-states` world where cache clones are cheap.
Issue Addressed
While digging around in some logs I noticed that queries for validators by pubkey were taking 10ms+, which seemed too long. This was due to a loop through the entire validator registry for each lookup.
Proposed Changes
Rather than using a loop through the register, this PR utilises the pubkey cache which is usually initialised at the head*. In case the cache isn't built, we fall back to the previous loop logic. In the vast majority of cases I expect the cache will be built, as the validator client queries at the
head
where all caches should be built.Additional Info
*I had to modify the cache build that runs after fork choice to build the pubkey cache. I think it had been optimised out, perhaps accidentally. I think it's preferable to have the exit cache and the pubkey cache built on the head state, as they are required for verifying deposits and exits respectively, and we may as well build them off the hot path of block processing. Previously they'd get built the first time a deposit or exit needed to be verified.
I've deleted the unused
map_state
function which was obsoleted bymap_state_and_execution_optimistic
.