[chain] Audit the gno.land storage layer #2445

zivkovicmilos · 2024-06-26T07:45:02Z

Description

This task concerns scoping out and documenting (can be a single HackMD document, not the official documentation) the current Gno.land storage layer.

We utilize LevelDB for our embedded storage, but have no concrete optimizations for writes / reads. The first step to optimizing the storage layer is to exactly detail:

what's being stored
how it's being stored
why it's being stored (if it's redundant)

ajnavarro · 2024-11-22T13:38:16Z

How Things Are Stored

Key-Value Structure

Below is a categorization of the main types of keys:

Object Data:
- s/_/oid:{HASH}:{SEQ}: Stores object content hashes plus marshaled Object data.
- s/_/oid:{HASH}:1#realm: Stores the Realm type, saving the oid hash + package name.
Type Information:
- s/_/tid:gno.land/p/demo/avl.Tree: Stores type definitions.
State Data:
- s/_/n{BYTES}: Saves information about the state, with logic found in nodedb.go.
Metadata:
- s/latest: Contains the latest version, which points to s/VERSION (a commitInfo type).
- s/_/last_header: Stores an abci.Header.

Nuances Found in the Current Implementation

Inconsistent Logic

Similar types, such as PackageValue.Block and PackageValue.FBlocks, are stored differently despite representing the same Value interface. This inconsistency makes the system error-prone and harder to maintain.

Inefficient Storage Serialization

Approximately 80%-90% of the data stored is redundant due to the direct storage of VM objects in LevelDB. Additionally, keys for objects are repeatedly marshaled, adding unnecessary overhead.

Unused Keys and Uneeded Storage Layers

There are keys, like those for BlockNodes (e.g., []byte("node:" + loc.String())), that are defined but not used. PrefixDB adds prefixes to keys and uses a mutex internally, which introduces unnecessary complexity. Keys are defined in multiple ways—some use hashed package names useful to have keys with the same length, while others use package names.

VM Model and Storage Coupling

Models have methods that receive the Store as a parameter to retrieve related metadata. This creates inconsistencies in how types are retrieved/cast/used and adds an unnecessary dependency between the model and storage layers.

Slice Storage Limitations

Slice metadata is not split into chunks, which could cause memory issues when marshaling/unmarshaling large amounts of data when retrieving big slices.

Next Steps

Option 1: Maintain Status Quo with Minor Fixes

Incrementally address specific issues, such as unifying key prefix logic and standardizing type storage.

Option 2: Introduce a Conversion Layer (Recommended)

Define storage-specific models fulfilling the use case, independent of VM state models. Establish a conversion layer between VM state models and storage models, simplifying data storage and retrieval while reducing redundancy.

This approach will simplify relationships between parts of the application, facilitate the implementation of other VMs in the future, and make it easier to maintain and optimize the storage layer.

We need to define storage models for:
- Package/realms metadata
- Gno files
- Relationships between packages
Implement efficient indexing to list available packages, avoiding O(n) complexity.
Design a static tree to define type relationships within a realm.
Create a Merkle tree to track value states and allow rollbacks if necessary.

Conclusion

I recommend moving forward with Option 2, as it provides a consistent and future-proof storage design. This will help reduce redundancy and make the system easier to maintain.

Would appreciate feedback from the team regarding how possible you think this approach is, and what is the best option to follow now.

thehowl · 2024-11-28T18:22:15Z

Let's try to tackle low-hanging fruit on our storage ahead of the launch; then we'll tackle larger re-organizations and specifications of the storage later on.

sw360cab · 2024-11-29T02:41:34Z

Shall we refer here about security concerns about local db?

Which for of security flaws is the db prone to?
Can it be compromised or tampered somehow by any malicious access or attack?
Can it be encrypted?
Which form of security can be applied?

zivkovicmilos · 2024-12-03T14:05:45Z

Thank you for doing the research @ajnavarro 🙏

A few open questions I had while reading this:

s/_/oid:{HASH}:{SEQ}: Stores object content hashes plus marshaled Object data.

Does this mean we store duplicated object content for each sequence, or is there some smarter logic that gives you the latest sequence for the latest object version?

s/latest: Contains the latest version, which points to s/VERSION (a commitInfo type).

What is the structure of this data? Just an int64 (block num)?

s/_/last_header: Stores an abci.Header.

I'm assuming this is updated every time we have a new block. Why do we store it?

PrefixDB adds prefixes to keys and uses a mutex internally

👀

The storage layer is something we can migrate if need be in the future, when we change data representations.
What I'd love for us to do in the interim before the mainnet launch, is documenting exactly what you've mentioned:

We need to define storage models for:

Package/realms metadata

Gno files

Relationships between packages

Implement efficient indexing to list available packages, avoiding O(n) complexity.

Design a static tree to define type relationships within a realm.

Create a Merkle tree to track value states and allow rollbacks if necessary.

Can we create open issues for these things? @Kouteki
I think we can start doing research ahead of the launch if there is wiggle room, as these things are direct performance and stability killers.

Kouteki · 2024-12-04T13:27:21Z

The majority is in favor of option 1 - minor fixes before the launch, and greater reorg afterwards.

I'll close this issue at the end of the cycle, and I've created #3264 to track option 2 afterwards

zivkovicmilos added 📦 🌐 tendermint v2 Issues or PRs tm2 related 📦 ⛰️ gno.land Issues or PRs gno.land package related labels Jun 26, 2024

zivkovicmilos added this to the 🏗4️⃣ test4.gno.land [POST LAUNCH] milestone Jun 26, 2024

zivkovicmilos assigned ajnavarro Jun 26, 2024

zivkovicmilos added this to 🧙‍♂️gno.land core team Jun 26, 2024

github-project-automation bot moved this to Triage in 🧙‍♂️gno.land core team Jun 26, 2024

zivkovicmilos moved this from Triage to Todo in 🧙‍♂️gno.land core team Jun 26, 2024

linear bot removed 📦 🌐 tendermint v2 Issues or PRs tm2 related 📦 ⛰️ gno.land Issues or PRs gno.land package related labels Sep 11, 2024

zivkovicmilos added the 🌟 must have 🌟 label Sep 11, 2024

zivkovicmilos added this to 🚀 main.gno.land launch 🚀 Sep 11, 2024

github-project-automation bot moved this to TODO in 🚀 main.gno.land launch 🚀 Sep 11, 2024

Kouteki modified the milestones: 🏗4️⃣ test4.gno.land [POST LAUNCH], 🚀 Mainnet launch Oct 16, 2024

Kouteki added in focus Core team is prioritizing this work and removed 🌟 must have 🌟 labels Oct 16, 2024

Kouteki added this to 🍜 Seoul triage Nov 13, 2024

Kouteki moved this from In Progress to In Review in 🧙‍♂️gno.land core team Nov 26, 2024

Kouteki mentioned this issue Dec 4, 2024

[META] Coversion layer #3264

Open

leohhhn changed the title ~~[chain] Audit the Gno.land storage layer~~ [chain] Audit the gno.land storage layer Dec 5, 2024

zivkovicmilos closed this as completed Dec 6, 2024

github-project-automation bot moved this from In Review to Done in 🧙‍♂️gno.land core team Dec 6, 2024

Kouteki removed the in focus Core team is prioritizing this work label Dec 16, 2024

Kouteki mentioned this issue Jan 6, 2025

Minutes: Core Staff Weekly Syncs [every Monday] gnolang/meetings#36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[chain] Audit the gno.land storage layer #2445

[chain] Audit the gno.land storage layer #2445

zivkovicmilos commented Jun 26, 2024

ajnavarro commented Nov 22, 2024 •

edited

Loading

thehowl commented Nov 28, 2024

sw360cab commented Nov 29, 2024

zivkovicmilos commented Dec 3, 2024

Kouteki commented Dec 4, 2024

[chain] Audit the gno.land storage layer #2445

[chain] Audit the gno.land storage layer #2445

Comments

zivkovicmilos commented Jun 26, 2024

Description

ajnavarro commented Nov 22, 2024 • edited Loading

How Things Are Stored

Key-Value Structure

Nuances Found in the Current Implementation

Inconsistent Logic

Inefficient Storage Serialization

Unused Keys and Uneeded Storage Layers

VM Model and Storage Coupling

Slice Storage Limitations

Next Steps

Option 1: Maintain Status Quo with Minor Fixes

Option 2: Introduce a Conversion Layer (Recommended)

Conclusion

thehowl commented Nov 28, 2024

sw360cab commented Nov 29, 2024

zivkovicmilos commented Dec 3, 2024

Kouteki commented Dec 4, 2024

ajnavarro commented Nov 22, 2024 •

edited

Loading