Easy Cron Gas Cost Improvements #771

ZenGround0 · 2023-07-30T23:46:00Z

ZenGround0
Jul 30, 2023
Collaborator

The recent work in #761 has provided some insight into which costs are dominating miner cron. The following are a few immediate ideas to reduce this cost. All these ideas are simple to understand and reduce total system cost by replacing expensive cron operations with less expensive cron operations or less expensive user operations. These ideas would be useful in an emergency situation where we need to find some place to cut system load quickly. We might also want to prioritize them in advance to prolong the stability of the existing miner cron configuration.

Moving these costs out of miner cron early will help smooth over a transition to safe cron (see also #493) by removing unnecessary over priced operations from the subsidized cron costs.

Remove NOOP jobs

Idea

Recent measurements show about 40% of cron jobs are scheduled with live partitions. If we could skip the 60% of jobs with no sectors to prove we could save 60% of overhead costs. We also know that overhead costs are significant: about 24M gas for a job with an empty vesting table and about twice that for a job with a full vesting table. Assuming todays number of about 60 jobs an epoch that puts us at somewhere around 1 billion gas saved per epoch. This would likely grow to be bigger if the network is upgraded to #735 in which case we expect more concentration of partitions and more noop jobs.

Implementation

We wouldn't need to start from scratch because @Stebalien designed and implemented a ~400 line version of this for go specs-actors that was considered for v4 actors. This was the PR comment:

This patch changes per-deadline cron such that:

Cron is not activated on construction.
When a deadline is activated (sectors are added), a cron callback is registered for that specific deadline.
When a deadline's cron tick fires, we'll register a new cron tick for that deadline if and only if there were live sectors.
CurrentDeadline is deprecated and should be removed.
ProvingPeriodStart will no longer be updated and should likely be renamed/changed in a future state migration.
The state migration in this patch is designed to be very simple:

It doesn't touch or even look at miners, it just looks at the cron.
It doesn't change the underlying actor types/structs.
Instead, it just replicates every miner's cron callback 47 times (nextDeadlineEnd + i*ChallengeWindow for 0 <= i <= 47).

This may permanently increase the size of the cron queue because we're registering callbacks for all upcoming deadlines in the next 24 hours, not just the next one. However, after the first day, most of these cron callbacks will not be rescheduled (because the miners have no sectors assigned to those deadlines). At no point will this increase the amount of work needed to be done in cron.

A few things to consider with this approach:

We'd lose consistent vesting guarantees (from the section below this is probably more of a good thing than a bad thing).
We'd lose the guarantee that miners with active cron have a current LatestDeadline index and would likely want to consider retiring the field.
The power actor's cron job queue would be in the worst case 48x bigger. This is a significant enough size increase that we would want to profile and potentially redesign the cron queue to handle a bigger size.
We would almost certainly want a migration (probably only over the very small power actor cron queue)

While the idea is simple the actual implementation, even building off of prior work, would be a significant undertaking.

Vest less often

Background

Currently miner proving deadline jobs begin by unlocking vested funds. The vested fund table is essentially an array that grows to 361 entries in the steady state of a miner actor and ramps up and down over half a year. Each entry is an epoch and token amount pair. In order to check whether there is anything to actually vest the entire array must be loaded into memory. Vesting is quantized over a 12 hour period, this means that only 2 cron jobs every day will ever unlock funds from the vesting table, but 48 cron jobs load it from state storage. This expensive loading of state incurs about half of the overhead of most cron jobs in the system today.

Idea: don't unlock vesting funds in cron

It is not a system correctness or fund availability requirement to vest funds in cron. We do this for the convenience of SPs. We could simple stop doing this.

Implementation

We would simply remove calling of unlock vested funds from cron.

We could potentially also expose a new method for vesting without withdrawal. Or we could repurpose withdraw to allow 0 withdrawals to trigger vesting.

Idea: only unlock vesting funds every 12 hours

Since we can only unlock funds every 12 hours we should only try to unlock funds every 12 hours. This would save us 24x the unlocking overhead cost.

Implementation

The miner actor code knows where these epochs fall without any protocol or state change. It only needs to construct a quant spec: let q = QuantSpec { unit: 12 * EPOCHS_IN_HOURS , offset: st.current_proving_period_start } and then check if the current epoch equals q.quantize_up(current_epoch)

This would make the vesting subsystem less flexible. For example it would make it harder for us to introduce new forms of vesting revenue with different frequencies or offsets. It solidifies the tying together of the reward vesting schedule and the vesting table.

This is my favorite approach because it is dead simple, a 2-3 line change and solves the problem completely under reasonable assumptions.

Idea: make vesting table data structure better

Instead of a flat cbor array we could structure the vesting table for more efficient head reads. We could achieve this with a linked list. The vesting table cid would now point to the head entry ( perhaps several head entries) and a pointer to the rest. In most cases the head indicates no unlocking needed i.e. no need to pop the head of the vesting funds table. During write events the whole table could be loaded so the head value could be replaced with the new head.

Implementation

Data structure change as described, would need state migration. We could probably do pretty well with a small batch of values on the head. We could maybe do really well with a skiplist structure. All worth investigating if we went this route.

Drop committed sector from Precommit Expiry Queue

Idea: delete completed precommits from the expiry queue at commit time

Precommit expiry is the most expensive cron operation. Cron jobs with precommit expiries to check are consistently big outliers in costing much more gas than other jobs. The simplest way to remove this cost is to have the miner actor's commitment code pay for modifying the precommit expiry queue. Even though today this will primarily move gas costs from one cron call (handle_proving_deadline) to another cron call (confirm_sector_proofs_valid), it will significantly reduce total system gas because we can completely remove the expensive HAMT traversal jobs that proving deadline cron does to ensure the precommit no longer exists. When we eventually move activation out of cron this change will also remove remaining costs from cron.

Implementation

When confirming committment of a sector (in cron after ProveCommit or inline in ProveCommitAggregate) we can pass the sector numbers of committed sectors to the precommit cleanup BitFieldQueues ::cut method. This will prevent cron from ever needing to check these precommits for existence and penalization.

If the cut function's traversal of many AMT epochs is deemed too inefficient we can improve the bitfield queue to do a cut out of a specific epoch's bitfield. During committment finalization (confirm_sector_proofs_valid) We have sufficient information to know which epoch precommit expiry is scheduled for: precommit.precommit_epoch + max_prove_commit_duration(rt.policy(), precommit.info.seal_proof) + rt.policy().expired_pre_commit_clean_up_delay. We can then use this to implement a more efficient version of BitfieldQueue::cut which directly looks up the bitfield for a given epoch and removes the precommit info from this epoch.

Idea: fix non caching mistake

Our code is written poorly so when we do precommit handling we reconstruct the precommit HAMT into memory every time meaning all of our HAMT level caching is never being used (see here and here. Some cron jobs expire > 100 precommits so caching would probably significantly help.

Implementation

Figure out how to make a version of get_precommitted_sector that does caching. Maybe it takes in a map. Maybe it takes in a map or maybe it returns a datastructure hold a map.

anorth · 2023-08-01T21:39:14Z

anorth
Aug 1, 2023
Maintainer

Thanks. A lot of these look like things we can do as internal improvements, without a FIP. Good starting points in that category might be

vest every 12 hours
fix pre-commit map caching
remove precommit from queue at activation

0 replies

aarshkshah1992 · 2023-10-05T09:55:41Z

aarshkshah1992
Oct 5, 2023
Collaborator

Raised filecoin-project/builtin-actors#1424 and filecoin-project/builtin-actors#1427 for vest every 12 hours and fix pre-commit map caching respectively.

0 replies

Stebalien · 2024-03-06T17:13:18Z

Stebalien
Mar 6, 2024
Collaborator

So, I was looking into this and we already do un-enroll from deadline cron. However, we only do so when all deposits and locked (vesting) funds are zero:

The jobs here aren't no-op unfortunately. We're:

Computing penalties.
Vesting.

If we got rid of automatic vesting (vest on-demand when assessing penalties and/or withdrawal), we'd be in a better place.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Easy Cron Gas Cost Improvements #771

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Easy Cron Gas Cost Improvements #771

ZenGround0 Jul 30, 2023 Collaborator

Remove NOOP jobs

Idea

Implementation

Vest less often

Background

Idea: don't unlock vesting funds in cron

Implementation

Idea: only unlock vesting funds every 12 hours

Implementation

Idea: make vesting table data structure better

Implementation

Drop committed sector from Precommit Expiry Queue

Idea: delete completed precommits from the expiry queue at commit time

Implementation

Idea: fix non caching mistake

Implementation

Replies: 3 comments

anorth Aug 1, 2023 Maintainer

aarshkshah1992 Oct 5, 2023 Collaborator

Stebalien Mar 6, 2024 Collaborator

ZenGround0
Jul 30, 2023
Collaborator

anorth
Aug 1, 2023
Maintainer

aarshkshah1992
Oct 5, 2023
Collaborator

Stebalien
Mar 6, 2024
Collaborator