feat: Parallellize snapshot creation #1799

segfault-magnet · 2024-04-02T16:08:03Z

closes: #1789

Feature:
A SnapshotWriter can now be partial_closeed which results in a SnapshotFragment. Those can then be mergeed together and finalized into a SnapshotMetadata (and a written snapshot at that).

The export process now spawns rayon tasks for every table. At the end of the task it will partial_close its writer and return the SnapshotFragment.

When all tasks are finished the fragments are merged and the snapshot is done.

A TaskManager was written to deduplicate rayon task fail-fast and cancel logic.

Refactor:
Snapshot generation and import was moved together into fuel-core. Mainly because the TaskManager needed a place to live accesible to both. But also because if you're adding tables to the snapshot you'd also be wanting to read them so having both close together seems sensible.

crates/chain-config/src/config/state.rs

bin/fuel-core/src/main.rs

bvrooman · 2024-04-02T21:08:38Z

crates/fuel-core/src/service/genesis/exporter.rs

+                db.entries::<T>(None, IterDirection::Forward)
+                    .chunks(group_size)
+                    .into_iter()
+                    .take_while(|_| !cancel.is_cancelled())
+                    .try_for_each(|chunk| writer.write(chunk.try_collect()?))?;


It may be interesting to see if you can parameterize the DB query to work with a quantity and/or primary index, so that you can also use Rayon and the task manager for the write_contract_snapshot function.

Thought about it, but the contract_snapshot is there only to support writing e2e tests.

As I understood it you'd deploy yourself a contract, interact however you wish with it. When satisfied freeze that moment in time for e2e tests with the contract snapshot.

As such didn't feel a need to optimize for performance.

I'll give it a try anyway because the code might be more readable if I succeed.

Turned out nicely IMO. Have a look at lemme know your thoughts :)
Edit:
Also found a (probably) significant performance issue. Previously I had understood iter_all(prefix) to mean "iterate all entries whose key has this prefix.

But what it actually means is "start iterating thought the db at the first match of this prefix (if any). Then continue until the end of the db. That caused the contract snapshot logic to load all state after the match and then separate out what belongs to the contract in question. Fixed that, so noice :)

I think introducing the KeyFilter trait is a nice abstraction, as it allows us to build where-esque clauses for queries at the call site level. I'm surprised that this change to the DB interface was relatively low impact on the rest of the codebase, which is also nice.

As a side note, my understanding of iter_all(prefix) is that your initial understanding is actually correct. It is defined as :

/// Returns an iterator over the all entries in the table with the specified prefix. fn iter_all_by_prefix<M, P>( &self, prefix: Option<P>, ) -> BoxedIter<super::Result<(M::OwnedKey, M::OwnedValue)>> where M: Mappable, P: AsRef<[u8]>, Self: IterableTable<M>, { self.iter_all_filtered::<M, P>(prefix, None, None) }

indicating that it should be filtering by prefix. If that's not the case, there might be something wrong at the implementation level.

I'm surprised that this change to the DB interface was relatively low impact

It's because even though the method sounds general, it was created only to support generating snapshots. Besides that nobody uses entries.

If that's not the case, there might be something wrong at the implementation level.

Interestingly enough that is not the case for rocksdb. If you look underneath you'll see the following (match on (prefix, start_key)):

(Some(prefix), None) => { if direction == IterDirection::Reverse { self.reverse_prefix_iter(prefix, column).into_boxed() } else { // start iterating in a certain direction within the keyspace let iter_mode = IteratorMode::From( prefix, convert_to_rocksdb_direction(direction), ); let mut opts = ReadOptions::default(); opts.set_prefix_same_as_start(true); self._iter_all(column, opts, iter_mode).into_boxed() } }

It looks like that the prefix is used just to start the iteration from where it matches, but it is not going to stop it. A simple test shows this happening:

#[test] fn recreate_me() { let test = |mut db: Database<OnChain>| { let contract_id_1 = ContractId::from_str( "5962be5ebddc516cb4ed7d7e76365f59e0d231ac25b53f262119edf76564aab4", ) .unwrap(); let mut insert_empty_code = |id| { StorageMutate::<ContractsRawCode>::insert(&mut db, &id, &[]).unwrap() }; insert_empty_code(contract_id_1); let contract_id_2 = ContractId::from_str( "5baf0dcae7c114f647f6e71f1723f59bcfc14ecb28071e74895d97b14873c5dc", ) .unwrap(); insert_empty_code(contract_id_2); use itertools::Itertools; let matched_keys: Vec<_> = db .iter_all_by_prefix::<ContractsRawCode, _>(Some(contract_id_1)) .map_ok(|(k, _)| k) .try_collect() .unwrap(); assert_eq!(matched_keys, vec![contract_id_1]); }; let temp_dir = tempfile::tempdir().unwrap(); let db = Database::<OnChain>::in_memory(); // in memory passes test(db); let db = Database::<OnChain>::open_rocksdb(temp_dir.path(), 1024 * 1024 * 1024) .unwrap(); // rocks db fails test(db); }

I'll raise this as a question internally and we can discuss there.

…1812) Follow-up PR to simplify the logic around parallel snapshot creation #1799. The change removes `Option<BlockHeight>`, `Option<DaBlockHeight>` and `Option<ChainConfig>` from snapshot fragments to not allow overriding of them on type system level. ### Before requesting review - [x] I have reviewed the code myself

@segfault-magnet

We used `RocksDB` filtering by prefix during `iter_all_by_prefix`. But if the prefix is not set for the column, the `RocksDB` ignores it. We don't set the prefix for all tables because of performance purposes. But it means that iteration sometimes can be wrong. The change adds a `Rust` level filtering since not all tables really need a prefix. The issue found by @segfault-magnet in #1799 (comment) ### Before requesting review - [x] I have reviewed the code myself

@Dentosal

## Version v0.25.0 ### Fixed - [#1821](#1821): Can handle missing tables in snapshot. - [#1814](#1814): Bugfix: the `iter_all_by_prefix` was not working for all tables. The change adds a `Rust` level filtering. ### Added - [#1831](#1831): Included the total gas and fee used by transaction into `TransactionStatus`. - [#1821](#1821): Propagate shutdown signal to (re)genesis. Also add progress bar for (re)genesis. - [#1813](#1813): Added back support for `/health` endpoint. - [#1799](#1799): Snapshot creation is now concurrent. - [#1811](#1811): Regenesis now preserves old blocks and transactions for GraphQL API. ### Changed - [#1833](#1833): Regenesis of `SpentMessages` and `ProcessedTransactions`. - [#1830](#1830): Use versioning enum for WASM executor input and output. - [#1816](#1816): Updated the upgradable executor to fetch the state transition bytecode from the database when the version doesn't match a native one. This change enables the WASM executor in the "production" build and requires a `wasm32-unknown-unknown` target. - [#1812](#1812): Follow-up PR to simplify the logic around parallel snapshot creation. - [#1809](#1809): Fetch `ConsensusParameters` from the database - [#1808](#1808): Fetch consensus parameters from the provider. #### Breaking - [#1826](#1826): The changes make the state transition bytecode part of the `ChainConfig`. It guarantees the state transition's availability for the network's first blocks. The change has many minor improvements in different areas related to the state transition bytecode: - The state transition bytecode lies in its own file(`state_transition_bytecode.wasm`) along with the chain config file. The `ChainConfig` loads it automatically when `ChainConfig::load` is called and pushes it back when `ChainConfig::write` is called. - The `fuel-core` release bundle also contains the `fuel-core-wasm-executor.wasm` file of the corresponding executor version. - The regenesis process now considers the last block produced by the previous network. When we create a (re)genesis block of a new network, it has the `height = last_block_of_old_netowkr + 1`. It continues the old network and doesn't overlap blocks(before, we had `old_block.height == new_genesis_block.hegiht`). - Along with the new block height, the regenesis process also increases the state transition bytecode and consensus parameters versions. It guarantees that a new network doesn't use values from the previous network and allows us not to migrate `StateTransitionBytecodeVersions` and `ConsensusParametersVersions` tables. - Added a new CLI argument, `native-executor-version,` that allows overriding of the default version of the native executor. It can be useful for side rollups that have their own history of executor upgrades. - Replaced: ```rust let file = std::fs::File::open(path)?; let mut snapshot: Self = serde_json::from_reader(&file)?; ``` with a: ```rust let mut json = String::new(); std::fs::File::open(&path) .with_context(|| format!("Could not open snapshot file: {path:?}"))? .read_to_string(&mut json)?; let mut snapshot: Self = serde_json::from_str(json.as_str())?; ``` because it is 100 times faster for big JSON files. - Updated all tests to use `Config::local_node_*` instead of working with the `SnapshotReader` directly. It is the preparation of the tests for the futures bumps of the `Executor::VERSION`. When we increase the version, all tests continue to use `GenesisBlock.state_transition_bytecode = 0` while the version is different, which forces the usage of the WASM executor, while for tests, we still prefer to test native execution. The `Config::local_node_*` handles it and forces the executor to use the native version. - Reworked the `build.rs` file of the upgradable executor. The script now caches WASM bytecode to avoid recompilation. Also, fixed the issue with outdated WASM bytecode. The script reacts on any modifications of the `fuel-core-wasm-executor` and forces recompilation (it is why we need the cache), so WASM bytecode always is actual now. - [#1822](#1822): Removed support of `Create` transaction from debugger since it doesn't have any script to execute. - [#1822](#1822): Use `fuel-vm 0.49.0` with new transactions types - `Upgrade` and `Upload`. Also added `max_bytecode_subsections` field to the `ConsensusParameters` to limit the number of bytecode subsections in the state transition bytecode. - [#1816](#1816): Updated the upgradable executor to fetch the state transition bytecode from the database when the version doesn't match a native one. This change enables the WASM executor in the "production" build and requires a `wasm32-unknown-unknown` target. ### Before requesting review - [x] I have reviewed the code myself ### After merging, notify other teams - [x] [Rust SDK](https://github.com/FuelLabs/fuels-rs/) - [x] [Sway compiler](https://github.com/FuelLabs/sway/) - [x] DevOps ## What's Changed * Add PR template by @Dentosal in #1806 * feat: Parallellize snapshot creation by @segfault-magnet in #1799 * Follow-up PR to simplify the logic around parallel snapshot creation by @xgreenx in #1812 * Bugfix: the `iter_all_by_prefix` was not working for all tables by @xgreenx in #1814 * Added back support for `/health` endpoint by @xgreenx in #1813 * Fetch consensus parameters from the provider by @xgreenx in #1808 * Fetch `ConsensusParameters` from the database by @xgreenx in #1809 * Handle FTI messages in executor by @Voxelot in #1787 * Use state transition bytecode from the database by @xgreenx in #1816 * feat: (re)genesis graceful shutdown by @segfault-magnet in #1821 * Use `fuel-vm 0.49.0` with new transactions types by @xgreenx in #1822 * Included the total gas and fee into `TransactionStatus` by @xgreenx in #1831 * Use versioning enum for WASM executor input and output by @xgreenx in #1830 * Support upgradability of the consensus parameters and state transition bytecode in genesis by @xgreenx in #1826 * Store old blocks and txs after regenesis by @Dentosal in #1811 * Regenesis of `SpentMessages` and `ProcessedTransactions` by @xgreenx in #1833 **Full Changelog**: v0.24.2...v0.25.0

…(#1812) Follow-up PR to simplify the logic around parallel snapshot creation FuelLabs/fuel-core#1799. The change removes `Option<BlockHeight>`, `Option<DaBlockHeight>` and `Option<ChainConfig>` from snapshot fragments to not allow overriding of them on type system level. ### Before requesting review - [x] I have reviewed the code myself

@segfault-magnet

We used `RocksDB` filtering by prefix during `iter_all_by_prefix`. But if the prefix is not set for the column, the `RocksDB` ignores it. We don't set the prefix for all tables because of performance purposes. But it means that iteration sometimes can be wrong. The change adds a `Rust` level filtering since not all tables really need a prefix. The issue found by @segfault-magnet in FuelLabs/fuel-core#1799 (comment) ### Before requesting review - [x] I have reviewed the code myself

@Dentosal

## Version v0.25.0 ### Fixed - [#1821](FuelLabs/fuel-core#1821): Can handle missing tables in snapshot. - [#1814](FuelLabs/fuel-core#1814): Bugfix: the `iter_all_by_prefix` was not working for all tables. The change adds a `Rust` level filtering. ### Added - [#1831](FuelLabs/fuel-core#1831): Included the total gas and fee used by transaction into `TransactionStatus`. - [#1821](FuelLabs/fuel-core#1821): Propagate shutdown signal to (re)genesis. Also add progress bar for (re)genesis. - [#1813](FuelLabs/fuel-core#1813): Added back support for `/health` endpoint. - [#1799](FuelLabs/fuel-core#1799): Snapshot creation is now concurrent. - [#1811](FuelLabs/fuel-core#1811): Regenesis now preserves old blocks and transactions for GraphQL API. ### Changed - [#1833](FuelLabs/fuel-core#1833): Regenesis of `SpentMessages` and `ProcessedTransactions`. - [#1830](FuelLabs/fuel-core#1830): Use versioning enum for WASM executor input and output. - [#1816](FuelLabs/fuel-core#1816): Updated the upgradable executor to fetch the state transition bytecode from the database when the version doesn't match a native one. This change enables the WASM executor in the "production" build and requires a `wasm32-unknown-unknown` target. - [#1812](FuelLabs/fuel-core#1812): Follow-up PR to simplify the logic around parallel snapshot creation. - [#1809](FuelLabs/fuel-core#1809): Fetch `ConsensusParameters` from the database - [#1808](FuelLabs/fuel-core#1808): Fetch consensus parameters from the provider. #### Breaking - [#1826](FuelLabs/fuel-core#1826): The changes make the state transition bytecode part of the `ChainConfig`. It guarantees the state transition's availability for the network's first blocks. The change has many minor improvements in different areas related to the state transition bytecode: - The state transition bytecode lies in its own file(`state_transition_bytecode.wasm`) along with the chain config file. The `ChainConfig` loads it automatically when `ChainConfig::load` is called and pushes it back when `ChainConfig::write` is called. - The `fuel-core` release bundle also contains the `fuel-core-wasm-executor.wasm` file of the corresponding executor version. - The regenesis process now considers the last block produced by the previous network. When we create a (re)genesis block of a new network, it has the `height = last_block_of_old_netowkr + 1`. It continues the old network and doesn't overlap blocks(before, we had `old_block.height == new_genesis_block.hegiht`). - Along with the new block height, the regenesis process also increases the state transition bytecode and consensus parameters versions. It guarantees that a new network doesn't use values from the previous network and allows us not to migrate `StateTransitionBytecodeVersions` and `ConsensusParametersVersions` tables. - Added a new CLI argument, `native-executor-version,` that allows overriding of the default version of the native executor. It can be useful for side rollups that have their own history of executor upgrades. - Replaced: ```rust let file = std::fs::File::open(path)?; let mut snapshot: Self = serde_json::from_reader(&file)?; ``` with a: ```rust let mut json = String::new(); std::fs::File::open(&path) .with_context(|| format!("Could not open snapshot file: {path:?}"))? .read_to_string(&mut json)?; let mut snapshot: Self = serde_json::from_str(json.as_str())?; ``` because it is 100 times faster for big JSON files. - Updated all tests to use `Config::local_node_*` instead of working with the `SnapshotReader` directly. It is the preparation of the tests for the futures bumps of the `Executor::VERSION`. When we increase the version, all tests continue to use `GenesisBlock.state_transition_bytecode = 0` while the version is different, which forces the usage of the WASM executor, while for tests, we still prefer to test native execution. The `Config::local_node_*` handles it and forces the executor to use the native version. - Reworked the `build.rs` file of the upgradable executor. The script now caches WASM bytecode to avoid recompilation. Also, fixed the issue with outdated WASM bytecode. The script reacts on any modifications of the `fuel-core-wasm-executor` and forces recompilation (it is why we need the cache), so WASM bytecode always is actual now. - [#1822](FuelLabs/fuel-core#1822): Removed support of `Create` transaction from debugger since it doesn't have any script to execute. - [#1822](FuelLabs/fuel-core#1822): Use `fuel-vm 0.49.0` with new transactions types - `Upgrade` and `Upload`. Also added `max_bytecode_subsections` field to the `ConsensusParameters` to limit the number of bytecode subsections in the state transition bytecode. - [#1816](FuelLabs/fuel-core#1816): Updated the upgradable executor to fetch the state transition bytecode from the database when the version doesn't match a native one. This change enables the WASM executor in the "production" build and requires a `wasm32-unknown-unknown` target. ### Before requesting review - [x] I have reviewed the code myself ### After merging, notify other teams - [x] [Rust SDK](https://github.com/FuelLabs/fuels-rs/) - [x] [Sway compiler](https://github.com/FuelLabs/sway/) - [x] DevOps ## What's Changed * Add PR template by @Dentosal in FuelLabs/fuel-core#1806 * feat: Parallellize snapshot creation by @segfault-magnet in FuelLabs/fuel-core#1799 * Follow-up PR to simplify the logic around parallel snapshot creation by @xgreenx in FuelLabs/fuel-core#1812 * Bugfix: the `iter_all_by_prefix` was not working for all tables by @xgreenx in FuelLabs/fuel-core#1814 * Added back support for `/health` endpoint by @xgreenx in FuelLabs/fuel-core#1813 * Fetch consensus parameters from the provider by @xgreenx in FuelLabs/fuel-core#1808 * Fetch `ConsensusParameters` from the database by @xgreenx in FuelLabs/fuel-core#1809 * Handle FTI messages in executor by @Voxelot in FuelLabs/fuel-core#1787 * Use state transition bytecode from the database by @xgreenx in FuelLabs/fuel-core#1816 * feat: (re)genesis graceful shutdown by @segfault-magnet in FuelLabs/fuel-core#1821 * Use `fuel-vm 0.49.0` with new transactions types by @xgreenx in FuelLabs/fuel-core#1822 * Included the total gas and fee into `TransactionStatus` by @xgreenx in FuelLabs/fuel-core#1831 * Use versioning enum for WASM executor input and output by @xgreenx in FuelLabs/fuel-core#1830 * Support upgradability of the consensus parameters and state transition bytecode in genesis by @xgreenx in FuelLabs/fuel-core#1826 * Store old blocks and txs after regenesis by @Dentosal in FuelLabs/fuel-core#1811 * Regenesis of `SpentMessages` and `ProcessedTransactions` by @xgreenx in FuelLabs/fuel-core#1833 **Full Changelog**: FuelLabs/fuel-core@v0.24.2...v0.25.0

segfault-magnet added 22 commits March 30, 2024 02:30

snapshot fragments for json work

ed3ab94

parquet fragment support, cleanup pending

42e0763

add tests for fragments

58b04be

comment out denies

f440bb9

Merge branch 'master' into feature/parallel_snapshot_writing

3fb1586

snapshot generation uses concurrent workers

4936ed9

task_manager used for import/export of snapshot

3a0b898

cleanup imports

316b1c9

use rayon in genesis importer

35449e6

move files around

633dc22

enable deny lints, fix errors in chain-config

b0fdfdc

wip, investigating features

1f0b3c3

feature gate imports, fix unused deps

1804a7d

ci checks

1209564

remove unused result

e01d204

use rayon for exporter

892849f

remove uuid dep

c8c28f7

dry up fragments tests

90785f4

deduplicate tests

58a4e25

inline path

cb27956

dedupe writer tests

4171e87

restructure into import/export format

9d09510

segfault-magnet added the enhancement New feature or request label Apr 2, 2024

segfault-magnet requested review from MujkicA and a team April 2, 2024 16:08

segfault-magnet self-assigned this Apr 2, 2024

segfault-magnet and others added 3 commits April 2, 2024 18:11

format and cargo sort

d84bd72

update change log

ebe836b

Merge branch 'master' into feature/parallel_snapshot_writing

6060bad

bvrooman reviewed Apr 2, 2024

View reviewed changes

segfault-magnet added 3 commits April 4, 2024 18:33

entries filter

47e5432

optimize

3c1ee01

shorten bounds

b9c3906

segfault-magnet requested review from bvrooman and xgreenx April 5, 2024 01:44

segfault-magnet added 3 commits April 5, 2024 03:45

Merge branch 'master' into feature/parallel_snapshot_writing

992bcc0

Merge branch 'master' into feature/parallel_snapshot_writing

416da94

Merge branch 'master' into feature/parallel_snapshot_writing

e073268

Dentosal mentioned this pull request Apr 8, 2024

Store old blocks and txs after regenesis #1811

Merged

5 tasks

bvrooman approved these changes Apr 8, 2024

View reviewed changes

segfault-magnet merged commit ad1674d into master Apr 9, 2024
33 checks passed

segfault-magnet deleted the feature/parallel_snapshot_writing branch April 9, 2024 02:10

This was referenced Apr 9, 2024

Follow-up PR to simplify the logic around parallel snapshot creation #1812

Merged

Bugfix: the iter_all_by_prefix was not working for all tables #1814

Merged

xgreenx mentioned this pull request Apr 18, 2024

Release v0.25.0 #1839

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Parallellize snapshot creation #1799

feat: Parallellize snapshot creation #1799

segfault-magnet commented Apr 2, 2024

bvrooman Apr 2, 2024 •

edited

Loading

segfault-magnet Apr 2, 2024

segfault-magnet Apr 5, 2024 •

edited

Loading

bvrooman Apr 8, 2024 •

edited

Loading

segfault-magnet Apr 9, 2024 •

edited

Loading

feat: Parallellize snapshot creation #1799

feat: Parallellize snapshot creation #1799

Conversation

segfault-magnet commented Apr 2, 2024

bvrooman Apr 2, 2024 • edited Loading

Choose a reason for hiding this comment

segfault-magnet Apr 2, 2024

Choose a reason for hiding this comment

segfault-magnet Apr 5, 2024 • edited Loading

Choose a reason for hiding this comment

bvrooman Apr 8, 2024 • edited Loading

Choose a reason for hiding this comment

segfault-magnet Apr 9, 2024 • edited Loading

Choose a reason for hiding this comment

bvrooman Apr 2, 2024 •

edited

Loading

segfault-magnet Apr 5, 2024 •

edited

Loading

bvrooman Apr 8, 2024 •

edited

Loading

segfault-magnet Apr 9, 2024 •

edited

Loading