Skip to content

Commit

Permalink
Merge branch 'master' into sai/10381_query_balance_by_ibc_denom
Browse files Browse the repository at this point in the history
  • Loading branch information
gsk967 authored Oct 22, 2021
2 parents 41ae4ac + 994219a commit 9439625
Show file tree
Hide file tree
Showing 4 changed files with 118 additions and 32 deletions.
113 changes: 103 additions & 10 deletions docs/architecture/adr-040-storage-and-smt-state-commitments.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ For data access we propose 2 additional KV buckets (implemented as namespaces fo
2. B2: `hash(key) → key`: a reverse index to get a key from an SMT path. Internally the SMT will store `(key, value)` as `prefix || hash(key) || hash(value)`. So, we can get an object value by composing `hash(key) → B2 → B1`.
3. We could use more buckets to optimize the app usage if needed.

Above, we propose to use a KV DB. However, for the state machine, we could use an RDBMS, which we discuss below.
We propose to use a KV database for both `SS` and `SC`. The store interface will allow to use the same physical DB backend for both `SS` and `SC` as well two separate DBs. The latter option allows for the separation of `SS` and `SC` into different hardware units, providing support for more complex setup scenarios and improving overall performance: one can use different backends (eg RocksDB and Badger) as well as independently tuning the underlying DB configuration.

### Requirements

Expand All @@ -64,6 +64,7 @@ State Commitment requirements:

+ fast updates
+ tree path should be short
+ query historical commitment proofs using ICS-23 standard
+ pruning (garbage collection)

### SMT for State Commitment
Expand All @@ -82,27 +83,27 @@ The full specification can be found at [Celestia](https://github.com/celestiaorg

Below, with simple _snapshot_ we refer to a database snapshot mechanism, not to a _ABCI snapshot sync_. The latter will be referred as _snapshot sync_ (which will directly use DB snapshot as described below).

Database snapshot is a view of DB state at a certain time or transaction. It's not a full copy of a database (it would be too big), usually a snapshot mechanism is based on a _copy on write_ and it allows to efficiently deliver DB state at a certain stage.
Some DB engines support snapshotting. Hence, we propose to reuse that functionality for the state sync and versioning (described below). It will the supported DB engines to ones which efficiently implement snapshots. In a final section we will discuss evaluated DBs.
Database snapshot is a view of DB state at a certain time or transaction. It's not a full copy of a database (it would be too big). Usually a snapshot mechanism is based on a _copy on write_ and it allows DB state to be efficiently delivered at a certain stage.
Some DB engines support snapshotting. Hence, we propose to reuse that functionality for the state sync and versioning (described below). We limit the supported DB engines to ones which efficiently implement snapshots. In a final section we discuss the evaluated DBs.

One of the Stargate core features is a _snapshot sync_ delivered in the `/snapshot` package. It provides a way to trustlessly sync a blockchain without repeating all transactions from the genesis. This feature is implemented in Cosmos SDK and requires storage support. Currently IAVL is the only supported backend. It works by streaming to a client a snapshot of a `SS` at a certain version together with a header chain.

A new `SS` snapshot will be created in every `EndBlocker` and identified by a block height. The `rootmulti.Store` keeps track of the available snapshots to offer `SS` at a certain version. The `rootmulti.Store` implements the `CommitMultiStore` interface, which encapsulates a `Committer` interface. `Committer` has a `Commit`, `SetPruning`, `GetPruning` functions which will be used for creating and removing snapshots. The `rootStore.Commit` function creates a new snapshot and increments the version on each call, and checks if it needs to remove old versions. We will need to update the SMT interface to implement the `Committer` interface.
A new database snapshot will be created in every `EndBlocker` and identified by a block height. The `root` store keeps track of the available snapshots to offer `SS` at a certain version. The `root` store implements the `RootStore` interface described below. In essence, `RootStore` encapsulates a `Committer` interface. `Committer` has a `Commit`, `SetPruning`, `GetPruning` functions which will be used for creating and removing snapshots. The `rootStore.Commit` function creates a new snapshot and increments the version on each call, and checks if it needs to remove old versions. We will need to update the SMT interface to implement the `Committer` interface.
NOTE: `Commit` must be called exactly once per block. Otherwise we risk going out of sync for the version number and block height.
NOTE: For the Cosmos SDK storage, we may consider splitting that interface into `Committer` and `PruningCommitter` - only the multiroot should implement `PruningCommitter` (cache and prefix store don't need pruning).

Number of historical versions for `abci.Query` and state sync snapshots is part of a node configuration, not a chain configuration (configuration implied by the blockchain consensus). A configuration should allow to specify number of past blocks and number of past blocks modulo some number (eg: 100 past blocks and one snapshot every 100 blocks for past 2000 blocks). Archival nodes can keep all past versions.
Number of historical versions for `abci.RequestQuery` and state sync snapshots is part of a node configuration, not a chain configuration (configuration implied by the blockchain consensus). A configuration should allow to specify number of past blocks and number of past blocks modulo some number (eg: 100 past blocks and one snapshot every 100 blocks for past 2000 blocks). Archival nodes can keep all past versions.

Pruning old snapshots is effectively done by a database. Whenever we update a record in `SC`, SMT won't update nodes - instead it creates new nodes on the update path, without removing the old one. Since we are snapshoting each block, we need to update that mechanism to immediately remove orphaned nodes from the storage. This is a safe operation - snapshots will keep track of the records which should be available for past versions.
Pruning old snapshots is effectively done by a database. Whenever we update a record in `SC`, SMT won't update nodes - instead it creates new nodes on the update path, without removing the old one. Since we are snapshotting each block, we need to change that mechanism to immediately remove orphaned nodes from the database. This is a safe operation - snapshots will keep track of the records and make it available when accessing past versions.

To manage the active snapshots we will either us a DB _max number of snapshots_ option (if available), or will remove snapshots in the `EndBlocker`. The latter option can be done efficiently by identifying snapshots with block height.
To manage the active snapshots we will either use a DB _max number of snapshots_ option (if available), or we will remove DB snapshots in the `EndBlocker`. The latter option can be done efficiently by identifying snapshots with block height and calling a store function to remove past versions.

#### Accessing old state versions

One of the functional requirements is to access old state. This is done through `abci.Query` structure. The version is specified by a block height (so we query for an object by a key `K` at block height `H`). The number of old versions supported for `abci.Query` is configurable. Accessing an old state is done by using available snapshots.
`abci.Query` doesn't need old state of `SC`. So, for efficiency, we should keep `SC` and `SS` in different databases (however using the same DB engine).
One of the functional requirements is to access old state. This is done through `abci.RequestQuery` structure. The version is specified by a block height (so we query for an object by a key `K` at block height `H`). The number of old versions supported for `abci.RequestQuery` is configurable. Accessing an old state is done by using available snapshots.
`abci.RequestQuery` doesn't need old state of `SC` unless the `prove=true` parameter is set. The SMT merkle proof must be included in the `abci.ResponseQuery` only if both `SC` and `SS` have a snapshot for requested version.

Moreover, Cosmos SDK could provide a way to directly access the state. However, a state machine shouldn't do that - since the number of snapshots is configurable, it would lead to nondeterministic execution.
Moreover, Cosmos SDK could provide a way to directly access a historical state. However, a state machine shouldn't do that - since the number of snapshots is configurable, it would lead to nondeterministic execution.

We positively [validated](https://github.com/cosmos/cosmos-sdk/discussions/8297) a versioning and snapshot mechanism for querying old state with regards to the database we evaluated.

Expand All @@ -118,6 +119,93 @@ We need to be able to process transactions and roll-back state updates if a tran

We identified use-cases, where modules will need to save an object commitment without storing an object itself. Sometimes clients are receiving complex objects, and they have no way to prove a correctness of that object without knowing the storage layout. For those use cases it would be easier to commit to the object without storing it directly.


### Refactor MultiStore

The Stargate `/store` implementation (store/v1) adds an additional layer in the SDK store construction - the `MultiStore` structure. The multistore exists to support the modularity of the Cosmos SDK - each module is using its own instance of IAVL, but in the current implementation, all instances share the same database. The latter indicates, however, that the implementation doesn't provide true modularity. Instead it causes problems related to race condition and atomic DB commits (see: [\#6370](https://github.com/cosmos/cosmos-sdk/issues/6370) and [discussion](https://github.com/cosmos/cosmos-sdk/discussions/8297#discussioncomment-757043)).

We propose to reduce the multistore concept from the SDK, and to use a single instance of `SC` and `SS` in a `RootStore` object. To avoid confusion, we should rename the `MultiStore` interface to `RootStore`. The `RootStore` will have the following interface; the methods for configuring tracing and listeners are omitted for brevity.

```go
// Used where read-only access to versions is needed.
type BasicRootStore interface {
Store
GetKVStore(StoreKey) KVStore
CacheRootStore() CacheRootStore
}

// Used as the main app state, replacing CommitMultiStore.
type CommitRootStore interface {
BasicRootStore
Committer
Snapshotter

GetVersion(uint64) (BasicRootStore, error)
SetInitialVersion(uint64) error

... // Trace and Listen methods
}

// Replaces CacheMultiStore for branched state.
type CacheRootStore interface {
BasicRootStore
Write()

... // Trace and Listen methods
}

// Example of constructor parameters for the concrete type.
type RootStoreConfig struct {
Upgrades *StoreUpgrades
InitialVersion uint64

ReservePrefix(StoreKey, StoreType)
}
```

<!-- TODO: Review whether these types can be further reduced or simplified -->
<!-- TODO: RootStorePersistentCache type -->

In contrast to `MultiStore`, `RootStore` doesn't allow to dynamically mount sub-stores or provide an arbitrary backing DB for individual sub-stores.

NOTE: modules will be able to use a special commitment and their own DBs. For example: a module which will use ZK proofs for state can store and commit this proof in the `RootStore` (usually as a single record) and manage the specialized store privately or using the `SC` low level interface.

#### Compatibility support

To ease the transition to this new interface for users, we can create a shim which wraps a `CommitMultiStore` but provides a `CommitRootStore` interface, and expose functions to safely create and access the underlying `CommitMultiStore`.

The new `RootStore` and supporting types can be implemented in a `store/v2` package to avoid breaking existing code.

#### Merkle Proofs and IBC

Currently, an IBC (v1.0) Merkle proof path consists of two elements (`["<store-key>", "<record-key>"]`), with each key corresponding to a separate proof. These are each verified according to individual [ICS-23 specs](https://github.com/cosmos/ibc-go/blob/f7051429e1cf833a6f65d51e6c3df1609290a549/modules/core/23-commitment/types/merkle.go#L17), and the result hash of each step is used as the committed value of the next step, until a root commitment hash is obtained.
The root hash of the proof for `"<record-key>"` is hashed with the `"<store-key>"` to validate against the App Hash.

This is not compatible with the `RootStore`, which stores all records in a single Merkle tree structure, and won't produce separate proofs for the store- and record-key. Ideally, the store-key component of the proof could just be omitted, and updated to use a "no-op" spec, so only the record-key is used. However, because the IBC verification code hardcodes the `"ibc"` prefix and applies it to the SDK proof as a separate element of the proof path, this isn't possible without a breaking change. Breaking this behavior would severely impact the Cosmos ecosystem which already widely adopts the IBC module. Requesting an update of the IBC module across the chains is a time consuming effort and not easily feasible.

As a workaround, the `RootStore` will have to use two separate SMTs (they could use the same underlying DB): one for IBC state and one for everything else. A simple Merkle map that reference these SMTs will act as a Merkle Tree to create a final App hash. The Merkle map is not stored in a DBs - it's constructed in the runtime. The IBC substore key must be `"ibc"`.

The workaround can still guarantee atomic syncs: the [proposed DB backends](#evaluated-kv-databases) support atomic transactions and efficient rollbacks, which will be used in the commit phase.

The presented workaround can be used until the IBC module is fully upgraded to supports single-element commitment proofs.

### Optimization: compress module key prefixes

We consider a compression of prefix keys by creating a mapping from module key to an integer, and serializing the integer using varint coding. Varint coding assures that different values don't have common byte prefix. For Merkle Proofs we can't use prefix compression - so it should only apply for the `SS` keys. Moreover, the prefix compression should be only applied for the module namespace. More precisely:
+ each module has it's own namespace;
+ when accessing a module namespace we create a KVStore with embedded prefix;
+ that prefix will be compressed only when accessing and managing `SS`.

We need to assure that the codes won't change. We can fix the mapping in a static variable (provided by an app) or SS state under a special key.

TODO: need to make decision about the key compression.

## Optimization: SS key compression

Some objects may be saved with key, which contains a Protobuf message type. Such keys are long. We could save a lot of space if we can map Protobuf message types in varints.

TODO: finalize this or move to another ADR.

## Consequences

### Backwards Compatibility
Expand All @@ -131,11 +219,14 @@ We change the storage layout of the state machine, a storage hard fork and netwo
+ Decoupling state from state commitment introduce better engineering opportunities for further optimizations and better storage patterns.
+ Performance improvements.
+ Joining SMT based camp which has wider and proven adoption than IAVL. Example projects which decided on SMT: Ethereum2, Diem (Libra), Trillan, Tezos, Celestia.
+ Multistore removal fixes a longstanding issue with the current MultiStore design.
+ Simplifies merkle proofs - all modules, except IBC, have only one pass for merkle proof.

### Negative

+ Storage migration
+ LL SMT doesn't support pruning - we will need to add and test that functionality.
+ `SS` keys will have an overhead of a key prefix. This doesn't impact `SC` because all keys in `SC` have same size (they are hashed).

### Neutral

Expand Down Expand Up @@ -170,3 +261,5 @@ We were discussing use case where modules can use a support database, which is n
+ Facebook Diem (Libra) SMT [design](https://developers.diem.com/papers/jellyfish-merkle-tree/2021-01-14.pdf)
+ [Trillian Revocation Transparency](https://github.com/google/trillian/blob/master/docs/papers/RevocationTransparency.pdf), [Trillian Verifiable Data Structures](https://github.com/google/trillian/blob/master/docs/papers/VerifiableDataStructures.pdf).
+ Design and implementation [discussion](https://github.com/cosmos/cosmos-sdk/discussions/8297).
+ [How to Upgrade IBC Chains and their Clients](https://github.com/cosmos/ibc-go/blob/main/docs/ibc/upgrades/quick-guide.md)
+ [ADR-40 Effect on IBC](https://github.com/cosmos/ibc-go/discussions/256)
10 changes: 5 additions & 5 deletions docs/core/upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The Cosmos SDK uses two methods to perform upgrades.
- Exporting the entire application state to a JSON file using the `export` CLI command, making changes, and then starting a new binary with the changed JSON file as the genesis file. See [Chain Upgrade Guide to v0.42](https://docs.cosmos.network/v0.42/migrations/chain-upgrade-guide-040.html).

- Version v0.44 and later can perform upgrades in place to significantly decrease the upgrade time for chains with a larger state. Use the [Module Upgrade Guide](../building-modules/upgrade.md) to set up your application modules to take advantage of in-place upgrades.

This document provides steps to use the In-Place Store Migrations upgrade method.

## Tracking Module Versions
Expand Down Expand Up @@ -95,7 +95,7 @@ if err != nil {
if upgradeInfo.Name == "my-plan" && !app.UpgradeKeeper.IsSkipHeight(upgradeInfo.Height) {
storeUpgrades := storetypes.StoreUpgrades{
// add store upgrades for new modules
// Example:
// Example:
// Added: []string{"foo", "bar"},
// ...
}
Expand All @@ -119,7 +119,7 @@ You MUST manually set the consensus version in the version map passed to the `Up
import foo "github.com/my/module/foo"

app.UpgradeKeeper.SetUpgradeHandler("my-plan", func(ctx sdk.Context, plan upgradetypes.Plan, vm module.VersionMap) (module.VersionMap, error) {

// Register the consensus version in the version map
// to avoid the SDK from triggering the default
// InitGenesis function.
Expand All @@ -138,7 +138,7 @@ If you do not have a custom genesis function and want to skip the module's defau
import foo "github.com/my/module/foo"

app.UpgradeKeeper.SetUpgradeHandler("my-plan", func(ctx sdk.Context, plan upgradetypes.Plan, vm module.VersionMap) (module.VersionMap, error) {

// Set foo's version to the latest ConsensusVersion in the VersionMap.
// This will skip running InitGenesis on Foo
vm["foo"] = foo.AppModule{}.ConsensusVersion()
Expand All @@ -151,6 +151,6 @@ app.UpgradeKeeper.SetUpgradeHandler("my-plan", func(ctx sdk.Context, plan upgrad

You can sync a full node to an existing blockchain which has been upgraded using Cosmovisor

In order to successfully sync, you must start with the initial binary that the blockchain started with at genesis. Cosmovisor will handle downloading and switching to the binaries associated with each sequential upgrade.
In order to successfully sync, you must start with the initial binary that the blockchain started with at genesis. If all Software Upgrade Plans contain binary instruction then you can run Cosmovisor with auto download option to automatically handle downloading and switching to the binaries associated with each sequential upgrade. Otherwise you need to manually provide all binaries to Cosmovisor.

To learn more about Cosmovisor, see the [Cosmovisor Quick Start](../run-node/cosmovisor.md).
Loading

0 comments on commit 9439625

Please sign in to comment.