Easy Cron Gas Cost Improvements #771
Replies: 3 comments
-
Thanks. A lot of these look like things we can do as internal improvements, without a FIP. Good starting points in that category might be
|
Beta Was this translation helpful? Give feedback.
-
Raised filecoin-project/builtin-actors#1424 and filecoin-project/builtin-actors#1427 for |
Beta Was this translation helpful? Give feedback.
-
So, I was looking into this and we already do un-enroll from deadline cron. However, we only do so when all deposits and locked (vesting) funds are zero:
The jobs here aren't no-op unfortunately. We're:
If we got rid of automatic vesting (vest on-demand when assessing penalties and/or withdrawal), we'd be in a better place. |
Beta Was this translation helpful? Give feedback.
-
The recent work in #761 has provided some insight into which costs are dominating miner cron. The following are a few immediate ideas to reduce this cost. All these ideas are simple to understand and reduce total system cost by replacing expensive cron operations with less expensive cron operations or less expensive user operations. These ideas would be useful in an emergency situation where we need to find some place to cut system load quickly. We might also want to prioritize them in advance to prolong the stability of the existing miner cron configuration.
Moving these costs out of miner cron early will help smooth over a transition to safe cron (see also #493) by removing unnecessary over priced operations from the subsidized cron costs.
Remove NOOP jobs
Idea
Recent measurements show about 40% of cron jobs are scheduled with live partitions. If we could skip the 60% of jobs with no sectors to prove we could save 60% of overhead costs. We also know that overhead costs are significant: about 24M gas for a job with an empty vesting table and about twice that for a job with a full vesting table. Assuming todays number of about 60 jobs an epoch that puts us at somewhere around 1 billion gas saved per epoch. This would likely grow to be bigger if the network is upgraded to #735 in which case we expect more concentration of partitions and more noop jobs.
Implementation
We wouldn't need to start from scratch because @Stebalien designed and implemented a ~400 line version of this for go specs-actors that was considered for v4 actors. This was the PR comment:
A few things to consider with this approach:
While the idea is simple the actual implementation, even building off of prior work, would be a significant undertaking.
Vest less often
Background
Currently miner proving deadline jobs begin by unlocking vested funds. The vested fund table is essentially an array that grows to 361 entries in the steady state of a miner actor and ramps up and down over half a year. Each entry is an epoch and token amount pair. In order to check whether there is anything to actually vest the entire array must be loaded into memory. Vesting is quantized over a 12 hour period, this means that only 2 cron jobs every day will ever unlock funds from the vesting table, but 48 cron jobs load it from state storage. This expensive loading of state incurs about half of the overhead of most cron jobs in the system today.
Idea: don't unlock vesting funds in cron
It is not a system correctness or fund availability requirement to vest funds in cron. We do this for the convenience of SPs. We could simple stop doing this.
Implementation
We would simply remove calling of unlock vested funds from cron.
We could potentially also expose a new method for vesting without withdrawal. Or we could repurpose withdraw to allow 0 withdrawals to trigger vesting.
Idea: only unlock vesting funds every 12 hours
Since we can only unlock funds every 12 hours we should only try to unlock funds every 12 hours. This would save us 24x the unlocking overhead cost.
Implementation
The miner actor code knows where these epochs fall without any protocol or state change. It only needs to construct a quant spec:
let q = QuantSpec { unit: 12 * EPOCHS_IN_HOURS , offset: st.current_proving_period_start }
and then check if the current epoch equalsq.quantize_up(current_epoch)
This would make the vesting subsystem less flexible. For example it would make it harder for us to introduce new forms of vesting revenue with different frequencies or offsets. It solidifies the tying together of the reward vesting schedule and the vesting table.
This is my favorite approach because it is dead simple, a 2-3 line change and solves the problem completely under reasonable assumptions.
Idea: make vesting table data structure better
Instead of a flat cbor array we could structure the vesting table for more efficient head reads. We could achieve this with a linked list. The vesting table cid would now point to the head entry ( perhaps several head entries) and a pointer to the rest. In most cases the head indicates no unlocking needed i.e. no need to pop the head of the vesting funds table. During write events the whole table could be loaded so the head value could be replaced with the new head.
Implementation
Data structure change as described, would need state migration. We could probably do pretty well with a small batch of values on the head. We could maybe do really well with a skiplist structure. All worth investigating if we went this route.
Drop committed sector from Precommit Expiry Queue
Idea: delete completed precommits from the expiry queue at commit time
Precommit expiry is the most expensive cron operation. Cron jobs with precommit expiries to check are consistently big outliers in costing much more gas than other jobs. The simplest way to remove this cost is to have the miner actor's commitment code pay for modifying the precommit expiry queue. Even though today this will primarily move gas costs from one cron call (
handle_proving_deadline
) to another cron call(confirm_sector_proofs_valid
), it will significantly reduce total system gas because we can completely remove the expensive HAMT traversal jobs that proving deadline cron does to ensure the precommit no longer exists. When we eventually move activation out of cron this change will also remove remaining costs from cron.Implementation
When confirming committment of a sector (in cron after ProveCommit or inline in ProveCommitAggregate) we can pass the sector numbers of committed sectors to the precommit cleanup
BitFieldQueue
s::cut
method. This will prevent cron from ever needing to check these precommits for existence and penalization.If the
cut
function's traversal of many AMT epochs is deemed too inefficient we can improve the bitfield queue to do acut
out of a specific epoch's bitfield. During committment finalization (confirm_sector_proofs_valid
) We have sufficient information to know which epoch precommit expiry is scheduled for:precommit.precommit_epoch + max_prove_commit_duration(rt.policy(), precommit.info.seal_proof) + rt.policy().expired_pre_commit_clean_up_delay
. We can then use this to implement a more efficient version ofBitfieldQueue::cut
which directly looks up the bitfield for a given epoch and removes the precommit info from this epoch.Idea: fix non caching mistake
Our code is written poorly so when we do precommit handling we reconstruct the precommit HAMT into memory every time meaning all of our HAMT level caching is never being used (see here and here. Some cron jobs expire > 100 precommits so caching would probably significantly help.
Implementation
Figure out how to make a version of
get_precommitted_sector
that does caching. Maybe it takes in a map. Maybe it takes in a map or maybe it returns a datastructure hold a map.Beta Was this translation helpful? Give feedback.
All reactions