[Merged by Bors] - Optimise HTTP validator lookups #3559

michaelsproul · 2022-09-09T05:13:55Z

Issue Addressed

While digging around in some logs I noticed that queries for validators by pubkey were taking 10ms+, which seemed too long. This was due to a loop through the entire validator registry for each lookup.

Proposed Changes

Rather than using a loop through the register, this PR utilises the pubkey cache which is usually initialised at the head*. In case the cache isn't built, we fall back to the previous loop logic. In the vast majority of cases I expect the cache will be built, as the validator client queries at the head where all caches should be built.

Additional Info

*I had to modify the cache build that runs after fork choice to build the pubkey cache. I think it had been optimised out, perhaps accidentally. I think it's preferable to have the exit cache and the pubkey cache built on the head state, as they are required for verifying deposits and exits respectively, and we may as well build them off the hot path of block processing. Previously they'd get built the first time a deposit or exit needed to be verified.

I've deleted the unused map_state function which was obsoleted by map_state_and_execution_optimistic.

michaelsproul · 2022-09-09T05:14:32Z

Here's some performance info from some of our Goerli nodes with thousands of inactive validator keys:

This is the time required for the VC to update all its validator indices from the BN, which drops from 110s to 83s on one VC (-25%) and from 29s to 14s on the other (-51%).

We can see that the underlying time that the BN is spending on the request is even more greatly reduced. Before running this PR the times are in the 2.5-6ms range, and afterwards are in the 50-250us range. Before:

Sep 09 06:48:21.113 DEBG Processed HTTP API request method: GET, path: /eth/v1/beacon/states/head/validators/0xa9dfac8705467d09bdf4db8e68cf6ef54e45294c432b180796634777829704429444d0b28eeed4a66a770bf3f1a104d2, status: 404 Not Found, elapsed: 2.298924ms

After:

Sep 09 06:52:33.637 DEBG Processed HTTP API request method: GET, path: /eth/v1/beacon/states/head/validators/0xa9dfac8705467d09bdf4db8e68cf6ef54e45294c432b180796634777829704429444d0b28eeed4a66a770bf3f1a104d2, status: 404 Not Found, elapsed: 58.289µs

That's a 10-50x reduction (at least -90%).

beacon_node/http_api/src/state_id.rs

paulhauner · 2022-09-28T05:19:41Z

cheers

@deauna, this review looks a lot like engagement farming to me and I will ban your account from this repository if it persists. I apologise sincerely if I've read this wrong.

paulhauner

Nice one, I've also been giving this issue the side-eye when poking around Grafana.

I have one comment, but it's not a blocker.

paulhauner · 2022-09-28T05:27:52Z

beacon_node/beacon_chain/src/canonical_head.rs

                    snapshot
                        .beacon_state
-                        .build_all_committee_caches(&self.spec)


It may be a little heavy handed to build the exit cache here as well, but I don't feel too strongly about that. IIRC we don't always need the exit cache to process a block.

I figure it's quasi-equivalent to building it on-demand when it is eventually required. If we process a block and it becomes head, chances are its descendants will remain canonical and the exit cache will be required eventually, at which point we've paid once to build it (when the first ancestor was processed). The only case where I think it'll hurt is if there are lots of re-orgs that miss the snapshot cache and load a fresh state without any caches. We hope that these are rare. These re-orgs and these sorts of caching issues are also mitigated by tree-states, where all the states share common CoW caches rooted at the finalized state.

My (potentially shoddy) understanding is that the exit cache is only required when we're processing exits, which is relatively infrequently. However, given that:

The cache we're building is on the head state.

The head state usually maintains it's caches (i.e., they aren't dropped and rebuilt often)

Trying to build an already-built exit cache is cheap.

I'm happy with this. I'll slap another approval!

paulhauner

Approved! I understand this PR went back to "ready-for-review" since you were waiting for my response to a comment.

michaelsproul · 2022-10-15T22:25:34Z

Sweet!

bors r+

## Issue Addressed While digging around in some logs I noticed that queries for validators by pubkey were taking 10ms+, which seemed too long. This was due to a loop through the entire validator registry for each lookup. ## Proposed Changes Rather than using a loop through the register, this PR utilises the pubkey cache which is usually initialised at the head*. In case the cache isn't built, we fall back to the previous loop logic. In the vast majority of cases I expect the cache will be built, as the validator client queries at the `head` where all caches should be built. ## Additional Info *I had to modify the cache build that runs after fork choice to build the pubkey cache. I think it had been optimised out, perhaps accidentally. I think it's preferable to have the exit cache and the pubkey cache built on the head state, as they are required for verifying deposits and exits respectively, and we may as well build them off the hot path of block processing. Previously they'd get built the first time a deposit or exit needed to be verified. I've deleted the unused `map_state` function which was obsoleted by `map_state_and_execution_optimistic`.

bors · 2022-10-16T00:51:24Z

Pull request successfully merged into unstable.

Build succeeded:

## Issue Addressed This reverts commit ca9dc8e (PR #3559) with some modifications. ## Proposed Changes Unfortunately that PR introduced a performance regression in fork choice. The optimisation _intended_ to build the exit and pubkey caches on the head state _only if_ they were not already built. However, due to the head state always being cloned without these caches, we ended up building them every time the head changed, leading to a ~70ms+ penalty on mainnet. https://github.com/sigp/lighthouse/blob/fcfd02aeec435203269b03865e3ccc23e5f51e6d/beacon_node/beacon_chain/src/canonical_head.rs#L633-L636 I believe this is a severe enough regression to justify immediately releasing v3.2.1 with this change. ## Additional Info I didn't fully revert #3559, because there were some unrelated deletions of dead code in that PR which I figured we may as well keep. An alternative would be to clone the extra caches, but this likely still imposes some cost, so in the interest of applying a conservative fix quickly, I think reversion is the best approach. The optimisation from #3559 was not even optimising a particularly significant path, it was mostly for VCs running larger numbers of inactive keys. We can re-do it in the `tree-states` world where cache clones are cheap.

## Issue Addressed While digging around in some logs I noticed that queries for validators by pubkey were taking 10ms+, which seemed too long. This was due to a loop through the entire validator registry for each lookup. ## Proposed Changes Rather than using a loop through the register, this PR utilises the pubkey cache which is usually initialised at the head*. In case the cache isn't built, we fall back to the previous loop logic. In the vast majority of cases I expect the cache will be built, as the validator client queries at the `head` where all caches should be built. ## Additional Info *I had to modify the cache build that runs after fork choice to build the pubkey cache. I think it had been optimised out, perhaps accidentally. I think it's preferable to have the exit cache and the pubkey cache built on the head state, as they are required for verifying deposits and exits respectively, and we may as well build them off the hot path of block processing. Previously they'd get built the first time a deposit or exit needed to be verified. I've deleted the unused `map_state` function which was obsoleted by `map_state_and_execution_optimistic`.

## Issue Addressed This reverts commit ca9dc8e (PR sigp#3559) with some modifications. ## Proposed Changes Unfortunately that PR introduced a performance regression in fork choice. The optimisation _intended_ to build the exit and pubkey caches on the head state _only if_ they were not already built. However, due to the head state always being cloned without these caches, we ended up building them every time the head changed, leading to a ~70ms+ penalty on mainnet. https://github.com/sigp/lighthouse/blob/fcfd02aeec435203269b03865e3ccc23e5f51e6d/beacon_node/beacon_chain/src/canonical_head.rs#L633-L636 I believe this is a severe enough regression to justify immediately releasing v3.2.1 with this change. ## Additional Info I didn't fully revert sigp#3559, because there were some unrelated deletions of dead code in that PR which I figured we may as well keep. An alternative would be to clone the extra caches, but this likely still imposes some cost, so in the interest of applying a conservative fix quickly, I think reversion is the best approach. The optimisation from sigp#3559 was not even optimising a particularly significant path, it was mostly for VCs running larger numbers of inactive keys. We can re-do it in the `tree-states` world where cache clones are cheap.

## Issue Addressed While digging around in some logs I noticed that queries for validators by pubkey were taking 10ms+, which seemed too long. This was due to a loop through the entire validator registry for each lookup. ## Proposed Changes Rather than using a loop through the register, this PR utilises the pubkey cache which is usually initialised at the head*. In case the cache isn't built, we fall back to the previous loop logic. In the vast majority of cases I expect the cache will be built, as the validator client queries at the `head` where all caches should be built. ## Additional Info *I had to modify the cache build that runs after fork choice to build the pubkey cache. I think it had been optimised out, perhaps accidentally. I think it's preferable to have the exit cache and the pubkey cache built on the head state, as they are required for verifying deposits and exits respectively, and we may as well build them off the hot path of block processing. Previously they'd get built the first time a deposit or exit needed to be verified. I've deleted the unused `map_state` function which was obsoleted by `map_state_and_execution_optimistic`.

## Issue Addressed This reverts commit ca9dc8e (PR sigp#3559) with some modifications. ## Proposed Changes Unfortunately that PR introduced a performance regression in fork choice. The optimisation _intended_ to build the exit and pubkey caches on the head state _only if_ they were not already built. However, due to the head state always being cloned without these caches, we ended up building them every time the head changed, leading to a ~70ms+ penalty on mainnet. https://github.com/sigp/lighthouse/blob/fcfd02aeec435203269b03865e3ccc23e5f51e6d/beacon_node/beacon_chain/src/canonical_head.rs#L633-L636 I believe this is a severe enough regression to justify immediately releasing v3.2.1 with this change. ## Additional Info I didn't fully revert sigp#3559, because there were some unrelated deletions of dead code in that PR which I figured we may as well keep. An alternative would be to clone the extra caches, but this likely still imposes some cost, so in the interest of applying a conservative fix quickly, I think reversion is the best approach. The optimisation from sigp#3559 was not even optimising a particularly significant path, it was mostly for VCs running larger numbers of inactive keys. We can re-do it in the `tree-states` world where cache clones are cheap.

Optimise HTTP validator lookups

8ee5419

michaelsproul added work-in-progress PR is a work-in-progress optimization Something to make Lighthouse run more efficiently. HTTP-API labels Sep 9, 2022

Build more caches pls

cb144a9

michaelsproul added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Sep 9, 2022

macladson reviewed Sep 9, 2022

View reviewed changes

beacon_node/http_api/src/state_id.rs Show resolved Hide resolved

michaelsproul added 2 commits September 21, 2022 17:46

Merge remote-tracking branch 'origin/unstable' into opt-validator-lookup

ce30842

Fix comments (thanks Mac)

d543119

paulhauner approved these changes Sep 28, 2022

View reviewed changes

paulhauner added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Sep 28, 2022

michaelsproul added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Oct 5, 2022

paulhauner approved these changes Oct 15, 2022

View reviewed changes

paulhauner added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Oct 15, 2022

bors bot changed the title ~~Optimise HTTP validator lookups~~ [Merged by Bors] - Optimise HTTP validator lookups Oct 16, 2022

bors bot closed this Oct 16, 2022

michaelsproul deleted the opt-validator-lookup branch October 16, 2022 01:05

michaelsproul mentioned this pull request Oct 26, 2022

[Merged by Bors] - Revert "Optimise HTTP validator lookups" #3658

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Merged by Bors] - Optimise HTTP validator lookups #3559

[Merged by Bors] - Optimise HTTP validator lookups #3559

michaelsproul commented Sep 9, 2022 •

edited

Loading

michaelsproul commented Sep 9, 2022 •

edited

Loading

paulhauner commented Sep 28, 2022

paulhauner left a comment

paulhauner Sep 28, 2022

michaelsproul Sep 28, 2022

paulhauner Oct 15, 2022

paulhauner left a comment

michaelsproul commented Oct 15, 2022

bors bot commented Oct 16, 2022

[Merged by Bors] - Optimise HTTP validator lookups #3559

[Merged by Bors] - Optimise HTTP validator lookups #3559

Conversation

michaelsproul commented Sep 9, 2022 • edited Loading

Issue Addressed

Proposed Changes

Additional Info

michaelsproul commented Sep 9, 2022 • edited Loading

paulhauner commented Sep 28, 2022

paulhauner left a comment

Choose a reason for hiding this comment

paulhauner Sep 28, 2022

Choose a reason for hiding this comment

michaelsproul Sep 28, 2022

Choose a reason for hiding this comment

paulhauner Oct 15, 2022

Choose a reason for hiding this comment

paulhauner left a comment

Choose a reason for hiding this comment

michaelsproul commented Oct 15, 2022

bors bot commented Oct 16, 2022

michaelsproul commented Sep 9, 2022 •

edited

Loading

michaelsproul commented Sep 9, 2022 •

edited

Loading