-
Notifications
You must be signed in to change notification settings - Fork 20.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trie: simplify StackTrie implementation #23950
Conversation
Nodetype uint8 | ||
Val []byte | ||
Key []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to lookup why we implemented binary marshalling, but if we somewhere do encode this to disk, then this is a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmarks don't seem affected -- so sure, simplicity is better.
|
trie/stacktrie.go
Outdated
st := stackTrieFromPool(db) | ||
st.nodeType = leafNode | ||
st.keyOffset = ko | ||
st.key = append(st.key, key[ko:]...) | ||
st.key = key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm not sure. Previously this worked as a copy? (the st.key
is always empty after taking an object from the pool).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so previously the incoming key
was always copied, but the append
operation made it so that we could sometimes use an existing buffer, so no alloc was needed. WIth this change, leafs will reference the same backing slice.
We need to investigate whether that can be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(the st.key is always empty after taking an object from the pool).
That's because in Reset
, which is called when returning things to the pool, we do
st.key = st.key[:0]
That means we just truncate it, but don't dereference the backing-slice. So the actual underlying slice is reused, whenever we at a later point do append(st.key, ...)
.
A safer alternative to what you're doing would simply be to change it back into
st.key = append(st.key, key...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm running the stacktrie fuzzer on it now, let's see if it finds anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now see these were for both leaf and ext node (the only ones which have keys). I can restore these. Are they only for memory management optimization unless someone would change the content underneath?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are they only for memory management optimization unless someone would change the content underneath?
Not sure I understand the question. But basically, you assume that the caller will not modify the input key
ever again. Which may be true, but I'm not sure we have any guarantee.
The previous code always copied the key
, but did so in a way which for the most part never caused any allocs to happen. I think the previous way to do it is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok to direct hold the byte slice. We do have one place with modify the underlying slice, https://github.com/ethereum/go-ethereum/blob/master/trie/stacktrie.go#L437 but it's only for the leaf node.
Triage: we should run a full-sync and a snap-sync to verify it. |
trie/stacktrie.go
Outdated
} | ||
|
||
// Helper function that, given a full key, determines the index | ||
// at which the chunk pointed by st.keyOffset is different from | ||
// the same chunk in the full key. | ||
func (st *StackTrie) getDiffIndex(key []byte) int { | ||
diffindex := 0 | ||
for ; diffindex < len(st.key) && st.key[diffindex] == key[st.keyOffset+diffindex]; diffindex++ { | ||
for ; diffindex < len(st.key) && st.key[diffindex] == key[diffindex]; diffindex++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method woud become clearer by writing it like this
func (st *StackTrie) getDiffIndex(key []byte) int {
for idx, k := range st.key{
if k != key[idx]{
return idx
}
}
return len(st.key)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
Full sync started on |
snap-sync done, full-sync at 5.9M as of now |
@@ -135,7 +132,6 @@ func (st *StackTrie) unmarshalBinary(r io.Reader) error { | |||
st.nodeType = dec.Nodetype | |||
st.val = dec.Val | |||
st.key = dec.Key | |||
st.keyOffset = int(dec.KeyOffset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also remove the KeyOffset
field here https://github.com/ethereum/go-ethereum/pull/23950/files#diff-450c04b40c4e2f8c12ed10930cd0a8b8a6c25b77b1f3469e7205b43b5fad058aL132
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
This mistake might have hidden a binary incompatibility issue: we could properly read old binaries by ignoring KeyOffset byte.
This also looks like UnmarshalBinary()
is currently incompatible with MarshalBinary()
. Why is this not detected by any test?
trie/stacktrie.go
Outdated
st := stackTrieFromPool(db) | ||
st.nodeType = leafNode | ||
st.keyOffset = ko | ||
st.key = append(st.key, key[ko:]...) | ||
st.key = key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok to direct hold the byte slice. We do have one place with modify the underlying slice, https://github.com/ethereum/go-ethereum/blob/master/trie/stacktrie.go#L437 but it's only for the leaf node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the first version of StackTrie
was doing exactly that, with a limited amount of copies. This lead to a whole host of problems in which a key, being modified, modified another key that wasn't supposed to be. Now that we do a lot more fuzzing, this is likely no longer a big risk, however I would urge caution before merging this.
I don't see problem with reverting back to |
That would alleviate our concerns. IMO: Please do |
Trim the search key from head as it's being pushed deeper into the trie. Previously the search key was never modified but each node kept information how to slice and compare it in keyOffset. Now the keyOffset is not needed as this information is included in the slice of the search key. This way the keyOffset can be removed and key manipulation simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Trim the search key from head as it's being pushed deeper into the trie. Previously the search key was never modified but each node kept information how to slice and compare it in keyOffset. Now the keyOffset is not needed as this information is included in the slice of the search key. This way the keyOffset can be removed and key manipulation simplified.
Trim the search key from head as it's being pushed deeper into the trie.
Previously the search key was never modified but each node kept
information how to slice and compare it in
keyOffset
. Now thekeyOffset
is not needed as this information is included in the slice of the
search key. This way the
keyOffset
can be removed and key manipulationsimplified.
BTW, I ported this Trie implementation (without hashing implementation) to C++. This helped me a lot, thanks.