-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: chain: export-range #10145
perf: chain: export-range #10145
Conversation
436c034
to
31b894e
Compare
The code from my side is mostly finished. I have this running in "production" already, as we rely on it for archival snapshots. Testing has involved making sure that it exports the exact same set of blocks as the previous code (#9192). We have used snapshots generated by the previous code and this code to process the full Filecoin chain with Lily, so we are reasonably sure that they include what they should. In terms of performance, this version of the code seems to be a bit faster (~10%), mostly due to bugs fixed and improved blockstore access. Memory profiles are the same, essentially a large amount of memory is required to keep the "processed CID set" and the task queue (we cannot control its growth since a blockstore block can have an arbitrary number of children that need to be queued). That said memory usage is consistent. We had episodes of OOM/leaks with the previous code, but those have not happened so far with this one. The addressing of some deadlock-ing code makes me feel better about them though. In the previous PR (#9192 ), there was a comment from @magik6k about back-pressure and improved RPC handling when doing a normal export (without I am of the opinion that we should remove that possibility all together and that export-range should only export to local disk. This is way faster. Currently snapshots are very large and take about 1h this way. I don't find it very useful to be able to ship them remotely via the RPC API (even more given that they are not compressed). In any case, I would implement that later if there is a better way that the one currently used, rather than letting that delay having this functionality upstreamed. -- I don't know what is with CI. I'm happy updating documentation when we have finished discussion here. |
31b894e
to
e43a6d4
Compare
This first commit contains the first and second implementation stabs (after primary review by @hsanjuan), using a stack for task buffering. Known issues: ctrl-c (context cancellation) results in the export code getting deadlocked. Duplicate blocks in exports. Duplicate block reads from store. Original commit messages: works works against mainnet and calibnet feat: add internal export api method - will hopfully make things faster by not streaming the export over the json rpc api polish: better file nameing fix: potential race in marking cids as seen chore: improve logging feat: front export with cache fix: give hector a good channel buffer on this shit docsgen
160c2db
to
d3d58fe
Compare
Removing this will mean that we will not be able to use this new export code with the existing snapshot service and will have to continue to use the cc @ognots |
dda703d
to
92c7d75
Compare
It is not going to be much faster than what you already have when pushing the snapshot through the API endpoint. |
92c7d75
to
5359c2c
Compare
This commit moderately refactors the ranged export code. It addresses several problems: * Code does not finish cleanly and things hang on ctrl-c * Same block is read multiple times in a row (artificially increasing cached blockstore metrics to 50%) * It is unclear whether there are additional races (a single worker quits when reaching height 0) * CARs produced have duplicated blocks (~400k for an 80M-blocks CAR or so). Some blocks appear up to 5 times. * Using pointers for tasks where it is not necessary. The changes: * Use a FIFO instead of stack: simpler implementation as its own type. This has not proven to be much more memory-friendly, but it has not made things worse either. * We avoid a probably not small amount of allocations by not using unnecessary pointers. * Fix duplicated blocks by atomically checking+adding to CID set. * Context-termination now works correctly. Worker lifetime is correctly tracked and all channels are closed, avoiding any memory leaks and deadlocks. * We ensure all work is finished before finishing, something that might have been broken in some edge cases previously. In practice, we would not have seen this except perhaps in very early snapshots close to genesis. Initial testing shows the code is currently about 5% faster. Resulting snapshots do not have duplicates so they are a bit smaller. We have manually verified that no CID is lost versus previous results, with both old and recent snapshots.
The improvements in the range-export code lead to avoid reading most blocks twice, as well as to allowing some blocks to be written to disk multiple times. The cache hit-rate went down from being close to 50% to a maximum of 12% at the very end of the export. The reason is that most CIDs are never read twice since they are correctly tracked in the CID set. These numbers do not support the maintenance of the CachingBlockstore code. Additional testing shows that removing it has similar memory-usage behaviour and about 5 minute-faster execution (around 10%). Less code to maintain and less options to mess up with.
Additionally, per feedback: * Set "admin" permissions * Remove from v0api
5359c2c
to
7ecc000
Compare
ok, I think I am good wrt to tests. The things that are failing or stuck don't seem related to this. It is a bit of a pain to rebase on master given the conflicts on generated-code. What is left to do? I can write a tutorial, but hopefully can do post-merge? |
Taking over from #9192 .
Related Issues
#9192
Proposed Changes
A new
chain export-range
command that can create archival-grade ranged exports of the chain as quickly as possible (letting Lotus write to disk directly).Additional Info
Checklist
Before you mark the PR ready for review, please make sure that:
<PR type>: <area>: <change being made>
fix: mempool: Introduce a cache for valid signatures
PR type
: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, testarea
, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps