-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICS20 GetEscrowAddress pre-image domain overlaps with public keys #7737
Comments
Another issue: encoding escrow addresses as truncated This exploit on your local chain is possible if your counterparty is able to choose both the port and the channel ids. I would say both port and channel ids needs to be chosen deterministically, e.g. the port id fixed to "transfer", and the channel id chosen deterministically from the counterparty chain id and some other additional parameters. Also, for additional security it would be good when computing the hash to separate the port id and the channel id with e.g. "/", i.e. compute it as the truncated hash of |
If possible, let's add a prefix to prevent any possibility of ed25519 brute-forcing. We need to check that this won't interfere with any |
It seems like we cannot easily add a prefix because parts of the SDK (e.g. here) are dependent on accounts using 20-character addresses. One short-term option is to use a fixed prefix and trim the hash, e.g. This does come at a tradeoff of making collisions between escrow addresses used by different channels more likely, though. |
... also maybe we should reconsider SHA256 because of Bitcoin mining? Maybe brute-forcing 10 or even 20 bytes is relatively easy? |
Have you read ADR 028 @cwgoes and the surrounding conversations? This has been talked about quite a bit, although not necessarily conclusively. |
So there are two issues here: collisions between module and non-module accounts, and collisions between module accounts (eg. between those created for each channel). The former should be solved by ADR28 using a pre-image prefix. The latter I think can be solved by domain separation between the port and channel? Eg. |
Sweet, thanks for the responses. I have updated the pr to use the |
We should certainly domain separate the port and channel - above, though, I was talking about collisions between module accounts created not by pre-image collisions but by hash collisions, especially if we're taking the first |
I am not a cryptographer, so here is the very approximate analysis; please take this with a grain of salt.
As correctly pointed out by Ethan above, we need to separate the two cases:
To properly analyze the cases the following is crucial:
Let me first illustrate this speed difference: ECDSA on 2.9 GHz i7-3520m : 42 K guesses/sec (see Table on page 65 here) SHA256 on my laptop, 1.8GHz i7-8565U
Translates to ~ 5000 K guesses/sec, so at least 2 orders of magnitude difference. Take into account this is laptops, single core, so on specialized hardware, multi-core, it will be much faster (see e.g. here) Taking into account that case 1 is not birthday attack and ECDSA, this case can be considered safe enough. The most dangerous is case 2b), because birthday attack allows to reduce the probability of a single guess from 1/ 2^160 to 1/2^80. Besides, the birthday attack can be implemented quite efficiently with limited memory using rainbow tables, or even with constant memory. I believe that birthday attack in combination with very fast guessing for SHA256 makes this computationally feasible. If needed, concrete computations of probability and the necessary resources can be easily performed. I also strongly believe that ADR-28 does not solve this problem for the module accounts, when the preimage is not the public ECDSA key, but a simple string, as the computational complexity in that case is only the fast SHA256. Whatever prefixes are added to the simple string, the attack algorithms will stay the same. To mitigate the problem I think this needs to be done:
The above determinism and boundedness (second point), if implemented, will have the additional benefit of preventing any kind of collisions completely. As it will be possible to enumerate all possible module account addresses for all chain combinations, it will also make possible to reject any attempt to register a non-module account with the same address, or to have two module accounts with the same address. Will be happy to engage in further discussions or provide other help if needed. |
Awesome, @andrey-kuprianov thank you so much for the write up! All the proposed solutions can definitely be added. Recent spec changes to connection and channel identifiers require deterministic identifier creation. After that change and after I update my pr fixing this issue, I think everything will look like this: prefix:
As far as changing the hash function goes, we can definitely switch away from sha256. Escrow addresses are only ever computed internally and are not user/client facing. The primary reason for choosing sha256 was because of wide adoption/support. Blake2b was mentioned in the creation of ADR 028 so maybe that would be a good alternative? I will leave the pr I have open using sha256 for now, but would like @cwgoes or @ebuchman to decide what should be used |
Thanks for the update, @colin-axner! The deterministic identifier selection is indeed the way to go, I think. I am not sure how many channels need to be created between the chains, but I suppose not too many? For the transfer channels I think it makes sense to limit this to a much smaller size, say uint16. As for the hash function, it could be anything simple, but computationally expensive, e.g. SHA256(PublicKey(SHA256(id))). As for the deterministic selection of identifiers in #7993: it's probably too late to provide some input at this point, but... some time ago I was designing a protocol that also needed to negotiate a numerical identifier between two parties. What I did was the following:
This approach is a bit more flexible than either sending a particular id, and also has more permission control than allowing the other side just to select anything. The way this goes is if Party 1 has ids [1,10) and [15,20) occupied, it can just invert this set to get [10, 15), [20, 100). Plus it can impose additional restrictions. And still provide enough freedom for the other side to select an identifier that is free on both sides. |
So, currently the channel identifier would have no reference to the 2 chains as you proposed above. We cannot reference the chain IDs since they are not guaranteed to be unique. We could reference the connectionID ( On second thought, using the current global counter construction, I think using
ChannelEnds don't need to agree on a particular id, they each have their own. |
Re counter construction: I think
Besides, having only 4 * 10^9 possible escrow addresses makes is computationally easy to completely enumerate them all, and exclude the collision possibility completely. EDIT: corrected the numbers above to use the right approximation formula: |
To summarize, my suggestion is as follows:
The reason why I propose to keep Hope that helps. |
I would like to outline here a couple of further alternatives to the solution proposed above (let's call it Alternative 1). Alternative 2: In case
Alternative 3: Implement protocol-level measures to prevent free choice of addresses by a potential attacker. This is more long-term; the goal here is to guarantee the absence of collisions by changing the protocols. One such protocol-level measure could be to implement window-based selection of channel ids as outlined in one of the messages above. E.g., when establishing a channel each party provides a window of local channel ids, (say [1-1000]), and the counterparty selects one id from that window. In that way we could actually leave the space of channel ids unrestricted, but whenever a channel is established provide a window that guarantees absence of collisions with any of existing escrow addresses: this will be easy and fast to do, because the windows will be in the order of e.g. bitwidth 10. At the same time, each party can prevent the other one from selecting a specific id, by choosing one from that window that guarantees absence of collisions also to the other party. |
I would also like to point to the previous rather detailed cost analysis to attack truncated SHA256 performed previously in the context of Tendermint. This analysis is no less relevant today as it was back then. |
@andrey-kuprianov Thank you so much for this detailed analysis! Alas, I am not a cryptographer either (seems we need one), but we'll have to make do ourselves for now - let me summarise my understanding and suggestions for moving forward.
This is pretty interesting, but under our current time constraints I think it is too complicated. I'm also a little hesitant to rely on a dynamic negotiation protocol to avoid collisions (versus either enumerating or making them unlikely enough and sufficiently computationally difficult to brute force that the chance is negligible). At that point we should consider changing the SDK more fundamentally so that we can just avoid using a hash function altogether (which means a non-standard-length address).
If necessary we could do this. However, I think
This alternative (Alternative 1 as you enumerated them) makes most sense to me. In the current model I agree that changing the prefix length has no effect on collision resistance (for module accounts), the channel numbering scheme is already what was proposed, Do you happen to have such a script, or know of a fast SHA256 implementation? Should I just try to write one in Python or C? I have one remaining question. In your case 1 above:
It is true that this is not a collision between two arbitrary preimages, but the attacker can execute a slightly different strategy, I think. Even with enumerated channel identifiers, an attacker could first enumerate all of the SHA256 hashes of the channel addresses (assuming we pursue Alternative 1 as above), and then start searching the ECDSA public key space, checking for each generated @colin-axner For now, we should pursue Alternative 1 - I think my last concern is probably not major. |
I want to point out that ADR 028 was not reviewed by any professional cryptographers. This was desired and even requested but it has not happened yet. I would like to improve the design if possible before it goes live. |
I wonder if for ADR 028 it would also help to increase addresses beyond the current 20 byte limit and/or add a prefix to the post-hash address instead of just the pre-image so that addresses for different purposes are strictly separated. When I initially addressed this issue, I had proposed 21 byte addresses, with the additional byte specifying the type of address (allowing up to 256 types per chain which seems reasonable). |
@cwgoes thank you for the feedback!
yes, changing the address length and/or format is probably the best option in the long run. Looking at the speed of the parallelization and miniaturization happening now, I would say moving from 20-byte addresses to e.g. 32-byte addresses is bound to happen in the next decade.
No, I don't have or know one yet, but can also write one in Rust for example, should be pretty easy. The only difficulty is that the required memory is around 200 GB, so some cloud machine e.g. AWS
I was also thinking about this alternative. This can be analyzed approximately as follows (see A Generalized Birthday Attack, page 132 for asymptotics). For the standard case, we have the approximation formula: For the case you describe the approximation formula becomes Alternative 1: Thus, for Alternatives 1 and 2 this attack is equivalent to the standard collision search with a single address but with the post-image bitwidth of 129 bits or 97 bits respectively. I think actually both of those are still acceptable at present point, taking into account the slow ECDSA computation and that it won't be possible to store 2^64 addresses, but the first one is of course much more secure. |
Right, makes sense. Thanks for the additional analysis - another reason to prefer Alternative 1 - sounds like we're in agreement. |
Would it make sense to consider changing the hash function for module addressses in the short term to at least mitigate possible attacks from SHA2 asics? |
Per my understanding of the discussion with @andrey-kuprianov we're just going to exhaustively search the module address space and ensure that there are no collisions. |
The problem faced in the original post has imo one underlying problem: Prefixing the inputs to hashes doesn't solve this. Address hashes should be 32 bytes, because many applications need collision resistant address hashes including this one! For instance, a contract model requires collision resistant address hashes for security, regardless of any domain separation you do. As we want to allow for more applications that rely on addresses, I think we should default the ecosystem to using 32 bytes addresses. Then domain separation of inputs is something thats great as a defense in depth, where collisions or pre-images can't be reused as widely. 32 byte addresses only introduce relatively minor bandwidth increases, that can easily be reduced with fancier solutions in the future. (e.g. for accounts, just only use the account-number to refer to the account) |
Re the second post from Andrey, we also absolutely have to ensure any data we encode to be hashed has a unique encoding, no way around that. We should probably write wrappers in the SDK to make this easier, since directly concatenating different points of user input allows to be used as a key for something allows that type of malleability attack, which repeatedly breaks security of systems. Length prefixing each component, and then concatenating works in the general case, but if you're guaranteed that neither string can include a particular character, then you can prefix-separate as discussed in the posts above. IMO whatever is done here should ideally be standardized throughout the SDK |
On the choice of hash function discussions, I'm pretty opposed to changing the hash function to add latency into each execution step. The need for this is completely eliminated by removing the concern of collisions. (Increasing address size to 32 bytes. If we needed a slower hash function, we should be using a standard password hashing algorithm, not something thats homebrewed. |
Suggesting we continue the conversation in #5694 which originally attempted to address this. |
@cwgoes I have checked for escrow address collisions for channel ids from I have tested wrt. all prefixes proposed above + another prefix which is 260 bytes long -- so that it doesn't fit into a single 64-byte input block of SHA-256. The results for channel ids from
While originally I thought that a machine with 256 GB of memory is needed, after a bit of optimization I was able to conduct the A few takeaways:
I hope this confirms the right choice of |
@andrey-kuprianov Thank you so much! Sounds like |
btw, if you under double the computational cost, hash collision attacks take basically no memory. This is the parallel rho method of finding hash collisions, you can derive it with some slight modifications of https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm_for_logarithms. See https://ieeexplore.ieee.org/abstract/document/8378028 for a paper on it. |
@ValarDragon thanks for the references. You are right; in one of the earlier posts I have cited a similar algorithm already:
There are two points to consider though:
|
Surfaced from Informal Systems IBC Audit of cosmos-sdk hash 82f15f3.
GetEscrowAddress
is the truncated SHA256(portID + channelID), which is the same address hash used for Ed25519 keys. While portID and channelID are bounded variable length alphanumeric strings, in principle its possible to create 32-byteportID + channelID
that are valid Ed25519 public keys. This could enable an attacker to set up a channel and then steal all the funds that are escrowed.This attack is likely infeasible at present given the need to brute force an Ed25519 public key that begins with
transfer
and is otherwise fully alphanumeric. However this depends on the validation criteria for channel IDs - if they were ever changed, say to allow arbitrary byte-strings, this attack may become possible.Thus it would probably wise to use domain separation for the pre-image to these module addresses, for instance a prefix guaranteeing the pre-image is not a valid public key (ie. is too long). It looks like this is worked into ADR-028.
The text was updated successfully, but these errors were encountered: