-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MerkleProof: Intermediate nodes can be reinterpreted as leaves #3091
Comments
Hello @hoytech, We will come back to you with a longer response next week, but this is something we are aware of, and we already discussed. AFAIK this is not an issue with our implementation but rather a property of merkle trees. It is true that users must be careful with what they use as leaves. IMO it's usually a hash that doesn't match the merkle construction mechanism, but you are right that there is a risk of confusion. We were particularly concerned with this risk when discussing rebuilding a leaf index during verification. Changing the |
Hi @Amxx, thanks for the reply. I don't believe it's really a property of merkle trees in general. Most merkle tree implementations prevent this by construction, by ensuring the input domains for constructing the leaf and node hashes are disjoint. They do this by prefixing/suffixing node type tags, using keyed hash functions, etc. In this way users can use any values at all for leaves. This is not an issue with the OZ implementation in the sense that the OZ implementation is incomplete and does not itself implement leaf hashing. Instead, it requires the user of the library to take precautions to prevent it. Fortunately it's easy to do by ensuring that the only allowed leaf values are fixed, non-64 byte lengths, and perhaps you're right that usually this is the case, but there are many not-unrealistic scenarios where this could be violated (I wrote a brief analysis of this in the issue description). Again, because most merkle tree implementations do not have this problem, it's especially important to notify library users, and it probably should have been done as soon as OZ was aware of it. My pull request has some basic wording that at least may help as a starting point. |
It is possible to prove the presence of certain values that aren't actually leaves in the tree.
This is a very slight variant of a well-known problem sometimes called a "second preimage attack" on merkle trees, although in my opinion that is a misleading name. The issue is that you can concatenate the hashes of two internal nodes (resulting in 64 bytes) and provide this as a value to be proved. The smart contract will hash this and attempt to verify it as though it was a normal leaf. If the proof has been shortened appropriately, it will incorrectly be validated.
The very slight variation from the typical description is that in the OZ implementation internal nodes are sorted, although this doesn't change the impact of this issue (which is low in most cases).
I included a test case in #3090 that demonstrates this issue and adds some documentation warning developers about how to avoid it.
Impact
The real world impact of this bug is likely small. The first condition for a successful attack is that a contract accepts legitimate leaf values with a length of exactly 64 bytes. For instance, Uniswap's merkle distributor uses the following to hash a leaf:
This is an example of code that is not vulnerable.
index
andamount
areuint256
s andaccount
is an address so the value passed tokeccak256
is32 + 20 + 32 = 84
bytes long.On the other hand, suppose the code was written like so:
In this case, the leaf value is
32 + 20 + 12 = 64
bytes long so it would potentially be vulnerable to a griefing/spamming attack where a large amount of tokens are sent to some random address, effectively burning them. Alternatively, ifindex
was packed into some small type (maybeuint32
) then it might be possible to burn innocent users' tokens by sending them to random addresses. The length condition could also be satisfied if variable-length values are used, for example:abi.encodePacked(address, bytes(...))
.The second condition of a successful attack is to overcome the fact that the attacker has very little control over the invalid values. This is because they are the output of hash functions over data that presumably the attacker cannot influence. If attackers do have some influence over values in the tree (maybe they can sign up for an airdrop using arbitrary addresses) then they may be able to "grind" advantageous values to some degree.
However, in the typical case an attacker has to make do with a set of effectively random values. The number of usable inputs scales approximately with the number of leaf values in the tree. An attacker can loop over all possibilities and pick the best one, and there will be more chances of a "good" pair of hashes if the tree is larger. However, this is obviously dominated by the large output range of the 256-bit hash function so is not likely a problem in practice.
Recommendation
I originally submitted this to Immunefi but they weren't interested so I'm just posting here. It may be worthwhile determining if any contracts have been deployed that could be impacted by this (probably few, if any).
At the very least, users should be warned in the documentation about this issue. Contracts should ensure that there is a "domain separation" between hashing leaf values and intermediate nodes. This can be done either by making sure leaf values are never 64 bytes in length, or by using a hash function other than keccak256 when hashing leaves. Also,
abi.encode()
should be encouraged overabi.encodePacked()
for several reasons.My pull request contains some suggested wording for this. Also feel free to include any part of this description in your documentation, if it helps.
There are some API-level changes that could be made to definitively prevent this. For instance,
verify()
could accept a user-defined value type (https://blog.soliditylang.org/2021/09/27/user-defined-value-types/) that wrapsbytes32
and can only be created by a keccak256-wrapper that also checks the input length is not 64-bytes. That said, I think leaving the responsibility of preventing this to the developer is fine, as long as there is appropriate documentation describing how to do so.References
The text was updated successfully, but these errors were encountered: