Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(trie): decode inline child nodes #2369

Merged
merged 13 commits into from
Apr 3, 2022
12 changes: 12 additions & 0 deletions internal/trie/node/decode.go
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,18 @@ func decodeBranch(reader io.Reader, header byte) (branch *Branch, err error) {
ErrDecodeChildHash, i, err)
}

// Handle inlined leaf nodes.
const hashLength = 32
if Type(hash[0]>>6) == LeafType && len(hash) < hashLength {
leaf, err := decodeLeaf(bytes.NewReader(hash[1:]), hash[0])
if err != nil {
return nil, fmt.Errorf("%w: at index %d: %s",
kishansagathiya marked this conversation as resolved.
Show resolved Hide resolved
ErrDecodeValue, i, err)
}
branch.Children[i] = leaf
continue
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question, I guess this inline format cannot be used for branches right? Only for leaves?

Copy link
Contributor Author

@notbdu notbdu Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so a few things here I think. Technically there could be, my impression of the logic is that any child node < 32 bytes can be in-lined leaf or branch.

The child parsing seems to be wrong in general? It looks like its currently parsing every child as a leaf node. Although I'm not sure it semantically makes a difference when its just a hash pointer.

Also, there are a few header types that are missing from the impl:

impl Decode for NodeHeader {
    fn decode<I: Input>(input: &mut I) -> Result<Self, codec::Error> {
        let i = input.read_byte()?;
        if i == trie_constants::EMPTY_TRIE {
            return Ok(NodeHeader::Null);
        }
        println!("header byte {:b} {:b}", i, i & (0b11 << 6));
        match i & (0b11 << 6) {
            trie_constants::LEAF_PREFIX_MASK => Ok(NodeHeader::Leaf(decode_size(i, input, 2)?)),
            trie_constants::BRANCH_WITH_MASK => {
                Ok(NodeHeader::Branch(true, decode_size(i, input, 2)?))
            }
            trie_constants::BRANCH_WITHOUT_MASK => {
                Ok(NodeHeader::Branch(false, decode_size(i, input, 2)?))
            }
            trie_constants::EMPTY_TRIE => {
                if i & (0b111 << 5) == trie_constants::ALT_HASHING_LEAF_PREFIX_MASK {
                    Ok(NodeHeader::HashedValueLeaf(decode_size(i, input, 3)?))
                } else if i & (0b1111 << 4) == trie_constants::ALT_HASHING_BRANCH_WITH_MASK {
                    Ok(NodeHeader::HashedValueBranch(decode_size(i, input, 4)?))
                } else {
                    // do not allow any special encoding
                    Err("Unallowed encoding".into())
                }
            }
            _ => unreachable!(),
        }
    }
}

The mask size for hashed leaves and branches are different but after digging through the scale decoder implementation it looks like it handles that internally?

I think it's probably worth looking into some of the other issues but outside the scope of this PR? Maybe just have some open issues for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like its currently parsing every child as a leaf node. Although I'm not sure it semantically makes a difference when its just a hash pointer.

As the decodeBranch comment mentions:

// Note that since the encoded branch stores the hash of the children nodes, we are not
// reconstructing the child nodes from the encoding. This function instead stubs where the
// children are known to be with an empty leaf. The children nodes hashes are then used to
// find other values using the persistent database.

In another PR, it would be interesting to refactor this to load from database as it decodes, especially since you can have in-lined encoded nodes, otherwise it's rather confusing. Created #2375.

there are a few header types that are missing from the impl

That might be the case, I opened #2374, thanks so much for your investigation!

The mask size for hashed leaves and branches are different but after digging through the scale decoder implementation it looks like it handles that internally?

Are you sure? Can you point to some code you've seen? @timwu20 maybe do you have a clue on this?

I think it's probably worth looking into some of the other issues but outside the scope of this PR? Maybe just have some open issues for now?

Totally 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? Can you point to some code you've seen? @timwu20 maybe do you have a clue on this?

I think it's probably worth looking into some of the other issues but outside the scope of this PR? Maybe just have some open issues for now?

It looks like that way based on this section of the header parsing code I shared above.

                if i & (0b111 << 5) == trie_constants::ALT_HASHING_LEAF_PREFIX_MASK {
                    Ok(NodeHeader::HashedValueLeaf(decode_size(i, input, 3)?))
                } else if i & (0b1111 << 4) == trie_constants::ALT_HASHING_BRANCH_WITH_MASK {
                    Ok(NodeHeader::HashedValueBranch(decode_size(i, input, 4)?))

https://github.com/paritytech/substrate/blob/345e3b1fcbbb0c479a031c45859d7154e7c4b5f7/primitives/trie/src/node_header.rs#L97-L102

Leaf hash mask:
11100000
Branch hash mask:
11110000

Inisde of decode_size(), the max value of the header size is then 32 and 16 for leaf and branch headers respectively.

branch.Children[i] = &Leaf{
HashDigest: hash,
}
Expand Down
59 changes: 59 additions & 0 deletions internal/trie/node/decode_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,65 @@ func Test_Decode(t *testing.T) {
Dirty: true,
},
},
"branch with two inlined children": {
reader: bytes.NewReader(
[]byte{
158, // node type 2 (branch w/o value) and key length 30
// Key data start
195, 101, 195, 207, 89, 214,
113, 235, 114, 218, 14, 122,
65, 19, 196, 16, 2, 80, 95,
14, 123, 144, 18, 9, 107,
65, 196, 235, 58, 175,
// Key data end
148, 127, 110, 164, 41, 8, 0, 0, 104, 95, 15, 31, 5,
21, 244, 98, 205, 207, 132, 224, 241, 214, 4, 93, 252,
187, 32, 134, 92, 74, 43, 127, 1, 0, 0,
},
),
n: &Branch{
Key: []byte{
12, 3, 6, 5, 12, 3,
12, 15, 5, 9, 13, 6,
7, 1, 14, 11, 7, 2,
13, 10, 0, 14, 7, 10,
4, 1, 1, 3, 12, 4,
},
Children: [16]Node{
nil, nil, nil, nil,
&Leaf{
Key: []byte{
14, 7, 11, 9, 0, 1,
2, 0, 9, 6, 11, 4,
1, 12, 4, 14, 11,
3, 10, 10, 15, 9,
4, 7, 15, 6, 14,
10, 4, 2, 9,
},
Value: []byte{0, 0},
Dirty: true,
},
nil, nil, nil, nil,
&Leaf{
Key: []byte{
15, 1, 15, 0, 5, 1,
5, 15, 4, 6, 2, 12,
13, 12, 15, 8, 4,
14, 0, 15, 1, 13,
6, 0, 4, 5, 13,
15, 12, 11, 11,
},
Value: []byte{
134, 92, 74, 43,
127, 1, 0, 0,
},
Dirty: true,
},
nil, nil, nil, nil, nil, nil,
},
Dirty: true,
},
},
}

for name, testCase := range testCases {
Expand Down
47 changes: 45 additions & 2 deletions internal/trie/node/encode_decode_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ func Test_Branch_Encode_Decode(t *testing.T) {
Dirty: true,
},
},
"branch with child": {
"branch with child leaf inline": {
branchToEncode: &Branch{
Key: []byte{5},
Children: [16]Node{
Expand All @@ -57,7 +57,50 @@ func Test_Branch_Encode_Decode(t *testing.T) {
Key: []byte{5},
Children: [16]Node{
&Leaf{
HashDigest: []byte{0x41, 0x9, 0x4, 0xa},
Key: []byte{9},
Value: []byte{10},
Dirty: true,
qdm12 marked this conversation as resolved.
Show resolved Hide resolved
},
},
Dirty: true,
},
},
"branch with child leaf hash": {
branchToEncode: &Branch{
Key: []byte{5},
Children: [16]Node{
&Leaf{
Key: []byte{
10, 11, 12, 13,
14, 15, 16, 17,
18, 19, 20, 21,
14, 15, 16, 17,
10, 11, 12, 13,
14, 15, 16, 17,
},
Value: []byte{
10, 11, 12, 13,
14, 15, 16, 17,
10, 11, 12, 13,
14, 15, 16, 17,
10, 11, 12, 13,
},
},
},
},
branchDecoded: &Branch{
Key: []byte{5},
Children: [16]Node{
&Leaf{
HashDigest: []byte{
2, 18, 48, 30, 98,
133, 244, 78, 70,
161, 196, 105, 228,
190, 159, 228, 199, 29,
254, 212, 160, 55, 199,
21, 186, 226, 204, 145,
132, 5, 39, 204,
},
},
},
Dirty: true,
Expand Down
24 changes: 21 additions & 3 deletions lib/trie/database.go
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,18 @@ func (t *Trie) load(db chaindb.Database, n Node) error {

hash := child.GetHash()

_, isLeaf := child.(*node.Leaf)
if len(hash) == 0 && isLeaf {
// node has already been loaded inline
// just set encoding + hash digest
_, _, err := child.EncodeAndHash(false)
if err != nil {
return err
}
child.SetDirty(false)
continue
}
qdm12 marked this conversation as resolved.
Show resolved Hide resolved

encodedNode, err := db.Get(hash)
if err != nil {
return fmt.Errorf("cannot find child node key 0x%x in database: %w", hash, err)
Expand Down Expand Up @@ -330,12 +342,18 @@ func getFromDB(db chaindb.Database, n Node, key []byte) (

// childIndex is the nibble after the common prefix length in the key being searched.
childIndex := key[commonPrefixLength]
childWithHashOnly := branch.Children[childIndex]
if childWithHashOnly == nil {
child := branch.Children[childIndex]
if child == nil {
return nil, nil
}

childHash := childWithHashOnly.GetHash()
// Child can be either inlined or a hash pointer.
childHash := child.GetHash()
_, isLeaf := child.(*node.Leaf)
if len(childHash) == 0 && isLeaf {
return getFromDB(db, child, key[commonPrefixLength+1:])
}

encodedChild, err := db.Get(childHash)
if err != nil {
return nil, fmt.Errorf(
Expand Down