Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chain] Audit the gno.land storage layer #2445

Closed
zivkovicmilos opened this issue Jun 26, 2024 · 5 comments
Closed

[chain] Audit the gno.land storage layer #2445

zivkovicmilos opened this issue Jun 26, 2024 · 5 comments
Assignees

Comments

@zivkovicmilos
Copy link
Member

Description

This task concerns scoping out and documenting (can be a single HackMD document, not the official documentation) the current Gno.land storage layer.

We utilize LevelDB for our embedded storage, but have no concrete optimizations for writes / reads. The first step to optimizing the storage layer is to exactly detail:

  • what's being stored
  • how it's being stored
  • why it's being stored (if it's redundant)
@ajnavarro
Copy link
Contributor

ajnavarro commented Nov 22, 2024

How Things Are Stored

Key-Value Structure

Below is a categorization of the main types of keys:

  • Object Data:

    • s/_/oid:{HASH}:{SEQ}: Stores object content hashes plus marshaled Object data.
    • s/_/oid:{HASH}:1#realm: Stores the Realm type, saving the oid hash + package name.
  • Type Information:

    • s/_/tid:gno.land/p/demo/avl.Tree: Stores type definitions.
  • State Data:

    • s/_/n{BYTES}: Saves information about the state, with logic found in nodedb.go.
  • Metadata:

    • s/latest: Contains the latest version, which points to s/VERSION (a commitInfo type).
    • s/_/last_header: Stores an abci.Header.

Nuances Found in the Current Implementation

Inconsistent Logic

Similar types, such as PackageValue.Block and PackageValue.FBlocks, are stored differently despite representing the same Value interface. This inconsistency makes the system error-prone and harder to maintain.

Inefficient Storage Serialization

Approximately 80%-90% of the data stored is redundant due to the direct storage of VM objects in LevelDB. Additionally, keys for objects are repeatedly marshaled, adding unnecessary overhead.

Unused Keys and Uneeded Storage Layers

There are keys, like those for BlockNodes (e.g., []byte("node:" + loc.String())), that are defined but not used. PrefixDB adds prefixes to keys and uses a mutex internally, which introduces unnecessary complexity. Keys are defined in multiple ways—some use hashed package names useful to have keys with the same length, while others use package names.

VM Model and Storage Coupling

Models have methods that receive the Store as a parameter to retrieve related metadata. This creates inconsistencies in how types are retrieved/cast/used and adds an unnecessary dependency between the model and storage layers.

Slice Storage Limitations

Slice metadata is not split into chunks, which could cause memory issues when marshaling/unmarshaling large amounts of data when retrieving big slices.

Next Steps

Option 1: Maintain Status Quo with Minor Fixes

Incrementally address specific issues, such as unifying key prefix logic and standardizing type storage.

Option 2: Introduce a Conversion Layer (Recommended)

Define storage-specific models fulfilling the use case, independent of VM state models. Establish a conversion layer between VM state models and storage models, simplifying data storage and retrieval while reducing redundancy.

This approach will simplify relationships between parts of the application, facilitate the implementation of other VMs in the future, and make it easier to maintain and optimize the storage layer.

  • We need to define storage models for:

    • Package/realms metadata
    • Gno files
    • Relationships between packages
  • Implement efficient indexing to list available packages, avoiding O(n) complexity.

  • Design a static tree to define type relationships within a realm.

  • Create a Merkle tree to track value states and allow rollbacks if necessary.

Conclusion

I recommend moving forward with Option 2, as it provides a consistent and future-proof storage design. This will help reduce redundancy and make the system easier to maintain.

Would appreciate feedback from the team regarding how possible you think this approach is, and what is the best option to follow now.

@Kouteki Kouteki moved this from In Progress to In Review in 🧙‍♂️gno.land core team Nov 26, 2024
@thehowl
Copy link
Member

thehowl commented Nov 28, 2024

Let's try to tackle low-hanging fruit on our storage ahead of the launch; then we'll tackle larger re-organizations and specifications of the storage later on.

@sw360cab
Copy link
Contributor

Shall we refer here about security concerns about local db?

  • Which for of security flaws is the db prone to?
  • Can it be compromised or tampered somehow by any malicious access or attack?
  • Can it be encrypted?
  • Which form of security can be applied?

@zivkovicmilos
Copy link
Member Author

Thank you for doing the research @ajnavarro 🙏

A few open questions I had while reading this:

s/_/oid:{HASH}:{SEQ}: Stores object content hashes plus marshaled Object data.

Does this mean we store duplicated object content for each sequence, or is there some smarter logic that gives you the latest sequence for the latest object version?

s/latest: Contains the latest version, which points to s/VERSION (a commitInfo type).

What is the structure of this data? Just an int64 (block num)?

s/_/last_header: Stores an abci.Header.

I'm assuming this is updated every time we have a new block. Why do we store it?

PrefixDB adds prefixes to keys and uses a mutex internally

👀


The storage layer is something we can migrate if need be in the future, when we change data representations.
What I'd love for us to do in the interim before the mainnet launch, is documenting exactly what you've mentioned:

  • We need to define storage models for:

    • Package/realms metadata
    • Gno files
    • Relationships between packages
  • Implement efficient indexing to list available packages, avoiding O(n) complexity.

  • Design a static tree to define type relationships within a realm.

  • Create a Merkle tree to track value states and allow rollbacks if necessary.

Can we create open issues for these things? @Kouteki
I think we can start doing research ahead of the launch if there is wiggle room, as these things are direct performance and stability killers.

@Kouteki
Copy link
Contributor

Kouteki commented Dec 4, 2024

The majority is in favor of option 1 - minor fixes before the launch, and greater reorg afterwards.

I'll close this issue at the end of the cycle, and I've created #3264 to track option 2 afterwards

@leohhhn leohhhn changed the title [chain] Audit the Gno.land storage layer [chain] Audit the gno.land storage layer Dec 5, 2024
@github-project-automation github-project-automation bot moved this from In Review to Done in 🧙‍♂️gno.land core team Dec 6, 2024
@Kouteki Kouteki removed the in focus Core team is prioritizing this work label Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

5 participants