Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump SSZ version for larger bitfield SmallVec #6915

Open
wants to merge 50 commits into
base: unstable
Choose a base branch
from

Conversation

paulhauner
Copy link
Member

@paulhauner paulhauner commented Feb 5, 2025

Issue Addressed

NA

Proposed Changes

Bumps the ethereum_ssz version, along with other crates that share the dep.

Primarily, this give us bitfields which can store 128 bytes on the stack before allocating, rather than 32 bytes (sigp/ethereum_ssz#38). The validator count has increase massively since we set it at 32 bytes, so aggregation bitfields (et al) now require a heap allocation. This new value of 128 should get us to ~2m active validators.

Additional Info

  • Bitfields have been moved from ssz_types to ethereum_ssz, so there's some non-substantial changes to error handling.
  • The MetaData struct in lighthouse_network holds two bitfields and therefore now has a larger stack size. I had to Arc it to avoid some clippy lints about disparate enum sizes.
  • An Attestation is now 96 bytes larger than it was before (32-128). You'll see some tests updated to expect these new sizes.

@paulhauner paulhauner added the work-in-progress PR is a work-in-progress label Feb 5, 2025
@paulhauner paulhauner changed the base branch from stable to unstable February 5, 2025 18:43
@paulhauner paulhauner added the optimization Something to make Lighthouse run more efficiently. label Feb 5, 2025
@paulhauner paulhauner added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Feb 10, 2025
@paulhauner paulhauner marked this pull request as ready for review February 10, 2025 02:48
@paulhauner paulhauner requested a review from jxs as a code owner February 10, 2025 02:48
@michaelsproul
Copy link
Member

michaelsproul commented Feb 10, 2025

We have a node running this branch on infra, are you planning to post some charts with metrics once it's been running for a little while?

I guess we're looking for lower total allocations per second, and maybe (indirectly) lower memory usage due to fragmentation. I don't imagine we get a noticeable reduction in CPU usage?

@paulhauner
Copy link
Member Author

I'm still collecting metrics in an attempt to show this has some effect. I've added some new jemalloc stats metrics. Hopefully I'll have something in the next couple of days.

@paulhauner
Copy link
Member Author

paulhauner commented Feb 19, 2025

Ok, so I have some metrics. The good news is that we can see a drop in allocations associated with this PR. The less good news is that I can't find any other metric that is affected by this change.

Below are metrics of stats.arenas.<i>.large.ndalloc (top) and stats.arenas.<i>.large.nmalloc (bottom) (details on these metrics).

Screenshot 2025-02-19 at 4 25 20 pm

What we can observe here is a reduction in the count of allocations. In particular, these are temporary allocations, as observed by the peaks in allocations and deallocations (I summed these in Grafana and they net-out to a flat line). Notably, I did not see any changes in:

  • Total memory allocated over time.
  • Memory footprint.
  • CPU usage.
  • Attestation verification times.

Therefore, we have managed to reduce the allocation count without being able to observe any tangible second-order benefit. Should we still merge this PR, I think yes. Primarily because reducing the allocation count feels like good hygiene to me. However, I am certainly open to opposing arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Something to make Lighthouse run more efficiently. ready-for-review The code is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants