-
-
Notifications
You must be signed in to change notification settings - Fork 318
[ARCHIVED] Lodestar Standup Meetings
⚠️ This section has been deprecated for https://github.com/ChainSafe/lodestar/wiki/Lodestar-Planning-&-Standup-Meetings
The Lodestar team hosts standup meetings weekly on Tuesdays at 9:45am Eastern Time. These meetings give the team an opportunity to sync on current issues, implementations and status updates on the Lodestar roadmap.
Previous notes of these meetings are on HackMD and archived here for maximum visibility to our community:
Previous 2021 Weekly Standups
Tuyen
- We lacked a lot of long lived subnets. This was the main issue for missing attestations.
- 4 or less is a problem. New fix keeps up 10+
- Validator metrics CLI PR for comparing and monitoring validator
- Light client server should only import blocks if it's a new head
Gajinder Merge tracker issues:
- Prepare to send information from validator to beacon node with feeRecipient
- Issue advanced fcUs
- When there is a produceBlock call, it asks engine to build payload and when the call completes, there was no time to give the payload.
- Update the fee recipient info with engine info, then issue advanced fcUs.
- Kept logic separate and easy for review
- Few test cases to write and edge cases to work on. Working on builder API for MEV-boost.
- Started writing the types and coding the builder endpoint.
- There is some flux with the state of the APIs that need to be called.
- Engine & builder will be side by side. BN might concurrently run these two and if it gets good enough payload to maximize MEV, then it might pick the builder block or go with engine.
- Publishing endpoints.
- Mergemock can be used to be the receiving point of all this.
Dadaepo
- NextProposer issue was blocked from SSZ, now proceeding.
- Changes made from feedback
- Revisiting for optimization
- Doppelganger missing scenario
- Remaining issue in gossipcache to check if particular validator gets synced. It gets pruned, will need to see impact.
- Filename syntax
- Conflict between code and documentation
- Stick with filename format or conform code to documentation with leniency.
Phil
- About 1 month away from code freeze. Reorganizing the board and tracking issues
- Be aware of what you're spending your time on.
- Likely will be another release in 2 weeks before the final code freeze.
Cayman
- Sync with js-libp2p
- Getting acquainted with their process + releases
- Cutting new release of libp2p after connectionGater feature is merged.
- p2p deny list: blacklist for bad peers that you won't dial or service any requests from
- Adding hooks into dialer (callback)
- Implementing this before TypeScript rewrite
- Lion: The problem we have is the nodes have no clue what their public address is, but know their internal address
- Should we turn off Identify completely?
- Not letting IPs get down to the dialler
- Another feature a deny list, but not at the dialler step, but at the point when you add an address to the address book.
- Cayman: It's better to disallow at the earliest possible time to prevent doing extra work.
- Merge in Tuyen's PR and re-releasing gossipsub
- SSZ Rewrite:
- Important to focus on documentation here and it's likely that people will use and have used this library
- Potentially bump SSZ to 1.0 as a way to prep for Lodestar. Might be good to minor bump it first just to ensure eveything is good, then move to 1.0
- When you're pre v0, every minor bump is a major
- The SSZ api is massive and many methods. Everything needs to make sense and longetivity of the information is important. Polished.
- Dade: Make some noise with 1.0 bump for Lodestar.
- Libp2p has an interesting release process that may warrant us looking back at our libraries for revamp & consistency
- For lower level libraries, after every PR it'll cut a new release candidate. Uses PR titles and commit messages to determine how to update the version. All automated.
- Automatic major bump if it's breaking, minor for feature, etc.
- At the higher level, we will gate the releases.
- Lion: Using conventional titles for PRs. We should start using it. Nice way to express intention and changelogs look better.
- Someone would need to develop the CI for this and populate the packages.
Lion
- SSZ PR looking stable
- Started the integration, way more changes than anticipated
- Large assumptions about structures
- About half way done
- Cannot run spec tests until fixed
Tuyen
- Discovered from nightly group that low peer count shows it takes 8% CPU to serialized due to backfill sync: https://github.com/ChainSafe/lodestar/issues/3657
- hashTreeRoot also takes 9% and will hopefully be better with new SSZ.
- Libp2p migration, getting through typing issue in master. New libp2p version releasing later today.
- https://github.com/ChainSafe/lodestar/pull/3534 is blocked by this
- Looking into new error adding gossip attestations to forkchoice: https://github.com/ChainSafe/lodestar/issues/3665
Gajinder
- Nimbus Insecura testing: https://github.com/ChainSafe/lodestar/issues/3658
- Able to sync on attacking/invalid chain from compromised bootnode.
- Node doesn't sync from
wss
because pyrmont is not finalizing - Lodestar started syncing from blank db without
wss
- Merged PR that fixes terminal block finding logic: https://github.com/ChainSafe/lodestar/pull/3655
- Investigate weak subjectivity checkpoint improper use
- Work on skip serializing blocks when persisting to DB.
- Unable to reproduce 3597: https://github.com/ChainSafe/lodestar/issues/3597
- Maybe some configuration issue on their end?
Dadaepo
- Integrate Remote Signer into Key manager branch, touch base with Lion and then write tests.
- Going back looking at diffhandler. It's out of sync. Update the documentation
Cayman
- ES modules stuff now supported by default in Node 14+. We should consider or look into because it's compatible with the web, where as common modules are not compatible.
- Libp2p was looking to write their libraries in TypeScript and they are only going to support ESM.
- Forked varInt library (@chainsafe/varint)
- Debugging recent libp2p-noise release (5.0.1)
- Highlights need to do integration against other libp2p implementations before release
- There's a libp2p/interop repo that handles this testing
- Lion: Looking at some of our libraries, they are dangerously undertested, please add as much coverage as possible. Not just sanity checks.
- Send ideas for scenarios and tests if you can.
- Measuring the performances of ESM & Common modules. (ESM can do smarter things than commonJS)
- Continuing work on discv5 rateLimiter
Lion
- Fixed changelog for releases with a new script.
- SSZ work looking good, will do PR with nice write-up.
Tuyen
- Fixed issue with not publishing proper nightlys on NPM (PR #3601)
-
Added new findings to applying gossip attestation into forkChoice
- With this improvement, total time to process gossip attestations in forkchoice is reduced from 30ms to more than 16ms (almost double the speed)
- Gossipsub fastmessage id cache needs comments/review
- Cayman: Benefits of discv5 may be overwhelmed by packets we get in total
Gajinder
- Kintsugi devnet is broken. peers are not syncing
- Completing backfill sync PR
- We can do backfill sync with checkpoints
- Want to complete spec changes implementation
- Blocks in some ELs are not valid in other ELs, creating issues in Kintsugi.
Dadaepo
- Setting up and familiarizing with Lodestar
- Provisioning contabo server
- Looking into KeyManager API implementation
- Fixing
dev
command: The documentation is not accurate and broken. Fixing cases and will update the documentation.
Cayman
- Updating Gossipsub to use latest pubsub implementation
- Type compatibility issues being worked on. L
Lion
- Rethinking SSZ library
- Brainstorming ideas about tree view.
- Investigating how Teku is doing it
- Early benchmarks look good. Processing of blocks are faster. Memory efficient data structures.
- Nodes are doing great so far, no proposals being skipped. Very few attestations being missed.
Tuyen
- js-libp2p-gossipsub fast message id: almost done, waiting for Cayman to review
- libp2p + gossipsub migration: in progress, waiting for libp2p to fix 1 type issue regarding abort-controller
- Lodestar gossip queues: done, wrap processMessageRpc instead of the validate function. This is a good improvement when we subscribe to all subnets and have to drop a lot of messages.
- Reprocess unknown block root attestations: just submitted a PR
- Track gossip peer score in lodestar metric: done
- New pubsub interface to return sent peers: PR is waiting for someone to review
- Very few peers on Kinsugi
- Gajinder: Errors from Exeuction client causing peers to be downscored until very few are left. Currently being discussed within merge debug group. Issue #3537 to follow up on this. PR #3545 to handle the errors properly.
Gajinder
- Working on handle errors thrown by EL's executePayload (#3545)
- Woring on backfill sync PR (#3384)
Phil
- v0.33.0
- Issue in github actions workflow for release.
- Issue resolved and v0.33.0 is released.
- Small issue with changelogs not compiling properly for new release process.
- Lodestar Setup Guide v2 draft is complete: https://hackmd.io/@philknows/rk5cDvKmK
- Looking for Eth Consensus (eth2) specialists to audit slashing protection on Lodestar!
Cayman
- Backreviewing PRs
- Starting PR for consensus specs today for beacon API submission
Tuyen
- Fixed the skipped slot weak subjectivity state sync issue
- Handle attester duties reorg: PR is under review
- Early epoch transition: it works nice if we have a look at gossip block metric, PR is under review
- Next task would be to submit attestations as soon as validator sees new head
- Pithos:
- lodestar-besu sync issue: besu had some update at their side and restarted the instance, it works well there but they said to keep monitoring it to make sure
- lodestar-geth: there's performance issue there that cause the node to take >5s to process a gossip block (not always happen). My best guess is we can't handle 1000 validators in pithos well since it's quite a big ratio of validators and the validator client has duties very frequently. I'm working on validating attestations/aggregate and proofs/sync committee messages/contribution and proofs in batch at api endpoint for an improvement.
Gajinder
- Validator interop with Lighthouse:
- Unhandled Promise Rejections approved and merged.
- Optimising Backfill sync from anchor checkpoint state (#3384)
- Started draftPR WeakSubjectivity Checks (#3391)
- Request for comments: SSZ Bytelist Implementation
Phil
- CSCON[1] looking for presentation ideas for Dec 1-3.
- Keeping an eye out for upcoming ETH events: ETH Denver, ETH Barcelona and DevCon on the radar.
- Lodestar Setup Guide v2 in the works to include Docker Compose setup and version for NPM install.
Lion
-
bootnodes were wrong, fixed now with Tuyen's PR
-
Milestones hit on Amphora
-
Goal of the community is to do what we did with Altair. Setup a long lasting merged testnet that would have nodes, explorer and to run it for testing.
-
Expect more testnets to pop up where we go through the transition multiple times
-
Refactoring SSZ, some problems need to be fixed before Altair fork
- Block processing is too inefficient
- Uses too much memory
- Because of the way our cache is structured
- Implementing fixes to this with help from Teku
- Requires more testing to make sure nothing breaks
Tuyen
- Investigated issue with the bootEnrs option which was looking for peers.
- Deploying to environment provided by Peri after
Gajinder
- Bring up nodes and join testnets and merge milestones
- SSZ casing is correctly mapped. Created PR for this.
- Looking into doing the byteList PR for SSZ.
Tuyen
- Gossip block issue:
- Message ID was incorrect after discussion with Lighthouse
- 20% our validator cannot vote for the correct head
- Working on double-vote attestation issue. Previously had these issues before.
Gajinder
- Implemented prohibit exemplar use in grafana
- Declaration type casing mapped in SSZ containers
- Investigating getAttesterDuties being called multiple times in epoch
- Investigating getState related all regenFNs' metrics. Currently showing empty data
- Resolving Insufficient BLS signature validation for lodestar+bls/herumi
- Lion: Whatever we choose, we need to achieve a high level of code readability and certainty of proper implementation.
- Will set flag to default
true
and throw error that skipping validation is not supported. - Confirm with asanso that everything is fixed
Lion
- Halted merge spec work due to awaiting critical decisions by Eth R&D.
- Reviewing legacy issues with beacon node organization
- Refactor unknown block sync
- Found serious issue where we don't verify block signatures. We are first importing blocks on the chain, then verify them.
- Found DDOS vector
- When code is production stable, we should spend a week to self-audit and re-read existing code. Some code has been there for ~3 years when strictness was not as high.
Phil
- Job description for Protocol Engineer coming up this week
- Released v1 of a Lodestar setup guide. Working to incorporate feedback for a v2 update.
- Lion: In reference to publishing attestations early, the cache which used to store attestations from a block that wasn't known yet was removed because it had no protection from attacks.
- Keep in mind that in networks where it is mostly Lodestar nodes, it will likely miss. All the other nodes will drop attestations because they cannot process early attestations.
Cayman
- Gossipsub update: Getting old/late blocks. Screwed with our ability to get blocks.
- Changed the way messages were being processed.
- PR open
- PR open to open more config abilities to be fully aligned with the spec
- Ex. heartbeat interval
Lion
- Focusing on merge spec
- Late/Old Blocks:
- Anything that comes in 2-3s will mess with the rewards & performance.
- Random issues still persistent like gossipsub messages sometimes being received, processing to not processing.
- Prater nodes:
- Memory issue still ongoing with Contabo servers
- PR: Refactoring signaifanctly how gossip interacts with hex string data
- Proof of Concept SSZ: Process attestations is very slow because of how we interact with the trie. To be done later.
- Added op pool for slashings and exits
Tuyen
- Implementing gossip topic scoring params
- Will try to take a profile of what is causing the OOM issue
Gajinder
- Cleanup on metrics for issue not found
- Testing new case library to see if it will work better
- Will continue with interoperability with other clients
Afri
- Organized Github Labels
- Preparing internal strategy for Lodestar for helping to allocate resources
-
Gitcoin grant was stopped by ChainSafe, but we are still very much active but will not have a prescence in asking the community for financial donations.
-
Triggering the POST route
/drop-state-cache
to drop the cache which dropsstateCache
cache size andcheckpointStateCache
makes a huge difference in memory. It proves that our memory issues come from the states. -
If the memory is too high, you should check
gc pause timerate
. If it's about 60-70%, everything starts to break down. Anything time sensitive starts to slow down.
Tuyen
- Fixed the epoch process effective balance by increment (#3083)
- Lion to review and merge Persist invalid ssz objects (#3067)
- Unable to recreate Docker syncing issue (#3089).
- Merged in improve getEffectiveBalances (#3065).
- Looking into other performance issues.
Gajinder
- PR Config API Interoperability with lighthouse (#3086):
- Lion: More data required from performance tests to make decision
- Lion: Should change casing to match specs?
- Lion: SSZ doesn't care about casing. Just give it fields and it uses the fields. Pull the casing out of it.
- Lion: Would like some research of the implications of doing this, definitely an option to explore.
Lion
-
Infrastructure review: Cleaned up
-
Concerning OOM behaviour error on Prater
- Spikes from 2 to 6GB in a short period of time.
- Node is not respecting the heap use limit at all
- Issue with heap snapshots on Linux - anybody know how to get it to work? Tuyen can do it with macOS.
-
Started the merge work
- Types done, next is beacon chain functions.
- Most complicated issue is interop with the eth1 chain
Cayman
- Monorepo merging
- Nice to use learna directly
- Options: Independent versioning for packages?
- Lion agrees with independent versioning
- Lion: Why don't we copy what js-ipfs setup?
- Issues: How to handle changelogs and github releases.
- I'm gonna push something today to
- Should we use new repo or overwriting existing SSZ/BLS libraries?
- Lion: Name is important
- Cayman: History is always kept. We can migrate other issues and try to pull them to the monorepo.
Phil
- Priorities
- Gitcoin grants for GR11
- Hiring job description for 2 new engineers