Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extensible address format #5694

Closed
aaronc opened this issue Feb 25, 2020 · 32 comments
Closed

Extensible address format #5694

aaronc opened this issue Feb 25, 2020 · 32 comments
Assignees
Milestone

Comments

@aaronc
Copy link
Member

aaronc commented Feb 25, 2020

Use Cases and Goals

Currently we are planning to add new address types for:

Currently secp256k1 keys and multisig accounts occupy the same overlapping 20 byte address space. We would like an extensible approach that:

Potential Approaches

Weave Conditions

As described in #3685 by @ethanfrey:

I spent quite a bit of time thinking about this issue while building weave... The other cosmos Sdk.

Basically I define a condition to be a type and format as human readable string with some binary data appended. This condition is hashed into an Address (again at 20 bytes). The use of this prefix makes it impossible to find a preimage for a given address with a different condition (eg ed25519 vs secp256k1).

This is explained in depth here https://weave.readthedocs.io/en/latest/design/permissions.html

And the code is here, look mainly at the top where we process conditions. https://github.com/iov-one/weave/blob/master/conditions.go

And as @ethanfrey explained this approach should be sufficient with 20 bytes of address space:

Yeah, AFAIK, 20 bytes should be collision resistance when the preimages are unique and not malleable. A space of 2^160 would expect some collision to be likely around 2^80 elements (birthday paradox). And if you want to find a collision for some existing element in the database, it is still 2^160. 2^80 only is if all these elements are written to state.

The good example you brought up was eg. a public key bytes being a valid public key on two algorithms supported by the codec. Meaning if either was broken, you would break accounts even if they were secured with the safer variant. This is only as the issue when no differentiating type info is present in the preimage (before hashing into an address).

I would like to hear an argument if the 20 bytes space is an actual issue for security, as I would be happy to increase my address sizes in weave. I just figured cosmos and ethereum and bitcoin all use 20 bytes, it should be good enough. And the arguments above which made me feel it was secure. But I have not done a deeper analysis.

21-byte addresses

This approach would add a 1 byte address type prefix to addresses, for up to 256 potential address types. Apps would then need to register individual address prefixes likely in the app constructor.

This has the advantage of explicitly eliminating the chance of address collision, although maybe that is not an issue as @ethanfrey explained above. There is the added overhead of initializing prefixes in the app constructor.


I'm really open to either approach as I'm not an expert in this area. Either way, it would be good to have a standardized well-documented approach.

What do people think? /cc @alexanderbez

@alexanderbez
Copy link
Contributor

So this would certainly break existing addresses so I want to make sure we get this right and allows us to be future adaptable and flexible (i.e. we can deal with future use-cases without necessarily knowing them right now).

As for the actual proposal, I'm curious what pre-hash "conditions" provide over just prepending a byte prefix apart from not revealing the condition (e.g. key type)? It seems like the byte-prefix is much simpler and quite flexible. Although I'd advise against the app constructor. We already are aware of the need to refactor the config usage in the SDK so we might as well tie this into that refactor.

Thoughts @ethanfrey @zmanian?

@alexanderbez alexanderbez added C:Keys Keybase, KMS and HSMs S:proposed labels Feb 25, 2020
@ethanfrey
Copy link
Contributor

As for the actual proposal, I'm curious what pre-hash "conditions" provide over just prepending a byte prefix apart from not revealing the condition (e.g. key type)? It seems like the byte-prefix is much simpler and quite flexible. Although I'd advise against the app constructor. We already are aware of the need to refactor the config usage in the SDK so we might as well tie this into that refactor.

It made sense when designing from the ground up (weave), to avoid collisions and not require any prefix on the address (just the preimage). Also we know what an issue it can become reserving prefix bytes and ensuring no collisions.

For the current case of the sdk, backwards compatibility is key. So secp256k1 keys must continue to work as is. For other curves, you can prepend a "type byte" on the addresses (making them 21 bytes), or prepend a byte/string/etc on the preimage (public key) before hashing - this would be []byte{} for secp256k1 and could be eg. []byte("ed25519") for ed25519 keys. The bonus there is you keep a consistent address length and the prepend part makes sure you can never find a preimage that matches for two curves (which is the purpose of this issue).

I think it is essential that this does not break existing addresses, thus there must be a special case for secp256k1 of a "nil" prefix. Other than that, I have no opinion on which route you choose to take - I think the developers who will be writing and maintaining this should choose which approach works best for them.

@alexanderbez
Copy link
Contributor

ACK on pre-image prefix with an empty byte slice for SECP256K1 keys.

@zmanian
Copy link
Member

zmanian commented Feb 26, 2020

So I like the conceptual integrity of uniform 20 byte addresses. And prefixing the key type with the preimage.

I don't see why information about the key needs to be in both the preimage and the digest("Address").

@aaronc
Copy link
Member Author

aaronc commented Feb 27, 2020

So it sounds like we're basically aligning on something similar to the weave condition approach.

Which means all new pubkeys (besides secp256k1) will have their type prepended to the pubkey bytes before applying sha256 and taking 20 bytes, i.e. address = sha256(concat(pubkeyType, pubkeyBytes)[0:20]. Yes?

Weave also includes an extension prefix: https://github.com/iov-one/weave/blob/master/conditions.go#L32. Not sure that's necessary but we should document the approach clearly and maybe a registry of prefixes somewhere. Clients will need to know this stuff. Groups would likely just have the group prefix. Signature algorithms are generally pretty obviously named.

I do wish we knew more about the actual collision resistance of this approach. The best info we have is what @ethanfrey shared above and also @alpe mentioned that this was not flagged as an issue in weave's audit report (https://github.com/iov-one/weave/blob/master/audit/IOV_Audit_Report_2019.1-Final%20with%20Release.pdf). So I guess that's validation. I just feel horribly unqualified to make this sort of call myself.

@alexanderbez
Copy link
Contributor

Which means all new pubkeys (besides secp256k1) will have their type prepended to the pubkey bytes before applying sha256 and taking 20 bytes, i.e. address = sha256(concat(pubkeyType, pubkeyBytes)[0:20]. Yes?

Yes, correct. Simple and clean. We will have an endpoint that returns all registered key types.

@aaronc
Copy link
Member Author

aaronc commented Feb 27, 2020 via email

@alexanderbez
Copy link
Contributor

With the current PubKey interface, I think a registry of prefixes as @alexanderbez suggests would be hard.

Not necessarily. We could keep the Tendermint logic intact, no? If left intact, yes, we would not be able to call and use pubkey.Address() directly, but instead we'd pass it to some other wrapper function that does the lookup and returns the correct address. Otherwise, this is a slightly more complex issue that needs to be addressed in the Tendermint repo.

@aaronc
Copy link
Member Author

aaronc commented Feb 28, 2020

Not necessarily. We could keep the Tendermint logic intact, no? If left intact, yes, we would not be able to call and use pubkey.Address() directly, but instead we'd pass it to some other wrapper function that does the lookup and returns the correct address. Otherwise, this is a slightly more complex issue that needs to be addressed in the Tendermint repo.

Okay, makes sense to not change anything Tendermint side and have a different route for obtaining the address. In that case though I would suggest an sdk.PubKey interface that just removes Address so that other signature algorithms in the new model implemented just in the SDK don't need to worry about it (we need to figure out an address format for secp256r1 currently).

@alexanderbez
Copy link
Contributor

Yeah, I'm not opposed to having curves/keys (re)-defined in the SDK (properly -- no amino). Any opposition to this @zmanian @marbar3778?

@aaronc
Copy link
Member Author

aaronc commented Feb 28, 2020

Yeah, I'm not opposed to having curves/keys (re)-defined in the SDK (properly -- no amino). Any opposition to this @zmanian @marbar3778?

And even without redefining all of them, we could simplify the requirements for signature algorithms just needed in the SDK and not tendermint - just relaxing the interface.

@zmanian
Copy link
Member

zmanian commented Feb 28, 2020

Sha256(KeyTypePrefix || Length || Keybytes) truncated to 20 bytes seems like the basic answer for address generation that almost any signature system can use.

@alexanderbez
Copy link
Contributor

@aaronc do you think there is a capacity to do this in the 0.39 release? I'm a bit skeptical. Perhaps this should be slated for 0.40?

@aaronc
Copy link
Member Author

aaronc commented Apr 20, 2020

Well it may not be that involved. If we're moving keys to the sdk it should probably be done at the same time. @alpe do you possibly have any bandwidth to raise a PR to address this? It seems like what we've aligned on here is pretty close to the weave approach.

@alpe
Copy link
Contributor

alpe commented Apr 24, 2020

I can start the basic implementation early next week but I guess migration and tests will make this a big package of work. Let's see

@alexanderbez alexanderbez removed their assignment Apr 27, 2020
@clevinson clevinson added this to the v0.39 milestone Apr 30, 2020
@aaronc
Copy link
Member Author

aaronc commented Nov 24, 2020

Given the discussions in #7737 about the guidelines in the current ADR 028 draft being insufficient, I want to push for more in depth analysis of this and getting the opinion of 1 or more people who are professional cryptographers.

I know that it will be a heavy lift to not use 20 byte addresses everywhere in the SDK because of the staking store key format. But I don't think that should be the reason we don't consider longer addresses. Cryptography should be the motivating reason IMHO and if we need to refactor staking store keys or whatever to accommodate that, then we have to do it sooner or later.

Anyway, my request is simply that we come to a decision on this that involves some more in depth cryptography to know we're making good choices for today and the future. @alessio any luck getting input from the cryptographer at AiB you mentioned? @ebuchman is there anyone on your team that could lean in? @andrey-kuprianov I know you're not a cryptographer but you seem to have a good grasp of some of this and I would appreciate your input.

@alessio
Copy link
Contributor

alessio commented Nov 24, 2020

CC'ing @alain2sf

@ValarDragon
Copy link
Contributor

ValarDragon commented Nov 24, 2020

Addresses should definitely move to 32 bytes imo, as collision resistance is a necessary property for many applications. #7737 highlights the need for this in a sub-area of IBC, and smart contracts and bitcoin scripts require this. (Its already a problem that they don't offer collision resistance at the moment)

Also collision attacks are really powerful, its going to be extremely hard to consider all the possible cross-overs where collisions on addresses are possible that will cause critical failures, and its highly likely some will be missed.

Also fully agreed that given collision resistant address hashes, the address hash should take as input the pubkey type enum (encoded in a non-malleable way) as Zaki mentioned in #5694 (comment)

With collision resistant hashes + prefixing, you get the guarantee that a pubkey -> privkey attack on one signature algorithm doesn't break all other signature algorithms.

Without collision resistant hashes, you must use post-fixing to get the same guarantee.

@andrey-kuprianov
Copy link
Contributor

andrey-kuprianov commented Nov 25, 2020

A very insightful discussion happening here, as well as in #3685, #4789; I will be happy to contribute my view on this.

I completely agree with @ValarDragon that extending the address length is the only way to allow addresses of different types, various signature types, etc.

I don't quite understand how concatenating different types of public keys in the pre-image of the hash function allows to achieve the goal. Suppose we have Sha256(KeyTypePrefix || Length || Keybytes)[:20]. Nothing prevents from having Addr = Sha256(KeyTypePrefix1 || Length1 || Keybytes1)[:20] = Sha256(KeyTypePrefix2 || Length2 || Keybytes2)[:20] -- this is just an ordinary collision achievable with 1/2^80 probability due to the Brithday attack. Moreover, mixing different signature types in the 20-bytes output only increases the security risk, as is nicely explained by @aaronc here #3685 (comment).

If changing the address scheme at all, I have another suggestion which might seem radical, but I hope is actually reasonable, given a wide range of usage scenarios the address should cover. I propose to

  • double the address length from 20 to 40 bytes
  • use the variable-length format (with the field separator, or with length prefixing, doesn't matter, should be hidden behind an API):
    1. variable-length address descriptor
    2. variable-length address data

What will this bring?

  • Remove unneeded complexity:
    • there will be no need to hash public keys, which are 32 or 33 bytes long, just store them as is.
    • e.g. for escrow addresses, do not hash, also store the port id and channel id directly.
  • Allow unrestricted address flexibility:
    • different key types -- no problem, just register different descriptors, like secp256k1 or secp256r1.
    • group accounts, "organization" type entities, smart contracts, whatever else: request a new descriptor for it, and define the data.
    • one other possibility would be e.g. to have an IPv6 address + 20-byte old-style address to allow so to say "physical" addresses.
  • Greatly increase security:
    • no collisions between different address types is possible.
    • individual address security is increased to the maximum, e.g. to 32 bytes for Ed25519 addresses.
    • compromising one address type (e.g. the pre-image attack) will have no influence on other types.

It is important that this structure is in the address post-image, not in the pre-image. This provides the separation of security between various address kinds.

The only drawback will be the increased bandwidth, but as correctly pointed out by @ValarDragon:

that can easily be reduced with fancier solutions in the future. (e.g. for accounts, just only use the account-number to refer to the account)

If the bandwidth is really an issue, this can be addressed e.g. as follows:

  • (temporarily) accept two kinds of addresses: old-style (20 bytes), and new-style (>20 bytes), using length as the differentiator
  • For new-style addresses, make the length completely variable, only bound it by some number: 40 or even 80 bytes, doesn't really matter
  • each particular type of addresses will be as long as needed, up to the maximum: some will be >40 bytes if needed, but many will be most probably <30 bytes, and for some even 10 will be enough.

@aaronc
Copy link
Member Author

aaronc commented Nov 25, 2020

Thanks @andrey-kuprianov.

I agree we should re-consider post-image prefixing as it is the only way that guarantees no attacks between different address types.

I am not sure whether or not addresses should be fixed length or variable length. In general, variable length makes sense to me. The two reasons not to do it that I see are really:

  1. the need to refactor store keys that depend on fixed length keys, i.e. in x/staking
  2. UX issues when formatting tables with addresses of different lengths

There are various ways to address 1, but a simple way is to simply not use the address as part of the key at all, but instead assign every address a unique uint64 ID (which already happens in x/auth) and use that 8-byte uint64 to form fixed length store keys.

As for 2, yeah it's annoying but UI designers can deal with it and it shouldn't limit changes needed for security.

@ValarDragon
Copy link
Contributor

I don't quite understand how concatenating different types of public keys in the pre-image of the hash function allows to achieve the goal. Suppose we have Sha256(KeyTypePrefix || Length || Keybytes)[:20]. Nothing prevents from having Addr = Sha256(KeyTypePrefix1 || Length1 || Keybytes1)[:20] = Sha256(KeyTypePrefix2 || Length2 || Keybytes2)[:20] -- this is just an ordinary collision achievable with 1/2^80 probability due to the Brithday attack. Moreover, mixing different signature types in the 20-bytes output only increases the security risk, as is nicely explained by @aaronc here #3685 (comment).

This only holds once you've increased hash length to be collision resistant. (Which imo should be made default regardless of any other decision, and that things not being collision resistant should require lots of thought) Then assuming security of the hash function, the desired property holds. No need for post-image prefixing here.

I'd prefer to frame post-image prefixing as keeping different address types in different address spaces (Then adding a general method for combining different address spaces into one single one, and this method must be the same across all applications).

I'm pretty opposed to this as a standalone solution, without guarantees that each address type is itself collision resistant. Collision attacks are a really powerful tool for the attacker, and the lack of collision resistance will make the system very mis-use vulnerable. From the developer perspective, its extremely plausible that people are going to make mistakes in choosing the right address space (or using an address space in incorrect ways, e.g. making dependent txs, scripting / contracts, etc.), which will then be a catastrophic failure without collision resistance. Making our API misuse resistant is in my opinion a key requirement for any framework building high security software. Its an anti-feature to force developers to have to think about collision resistance vulnerabilities whenever they want to work with addresses.

Separately, I also view namespacing each address type as high complexity. You need all different smart contracts, and software interacting with the chain to reason about the chains address space methods (which themselves are very liable to update). Further, if you want to interact with multiple chains, they may each have different address spacing methods, or you enforce a single namespacing mechanism for chains that communicate via IBC.

@aaronc
Copy link
Member Author

aaronc commented Nov 25, 2020

So if we were going to do post-image prefixing, I think it makes sense to have a global prefix registry like https://github.com/multiformats/multicodec. That's not actually that hard - we just have a csv file in a repository. Also, we can assume all module accounts have the same prefix and address space because they are always chain dependent. So the prefix table should only include prefixes for real public keys plus the single prefix for module accounts. In that way, it might not actually be such a big undertaking and hopefully developer UX will be reasonable.

I agree we need to make sure each address type is itself collision resistant. Maybe pre-image prefixing is enough if you have a long enough address, but again it's still a "one rotten apple spoils the whole bunch" type setup.

Either way, looks like we're definitely looking at variable address lengths because I don't think there would be much appetite for rewriting all the existing 20 byte address as 32 or 40 bytes.

@alain2sf
Copy link

I also completely agree with @andrey-kuprianov and @ValarDragon , extending the address length is the best way to allow different type of addresses, it's important to offer new signature schemes and a better diversity and support in cryptographic algos/models.
Having different signature types within the 20-bytes output form will be a recipe against security. I agree with previous rationales. Post image suffix/prefix is also something I would think of, because it's more dangerous when in the pre-image. The chosen-prefix collision attack is way more powerful than the regular collision attack as the attacker is in control of the hash input (remember the MD5 chosen-prefix attack). That's for example why HMAC is not vulnerable to such attacks.
Nice additional idea (?) the interesting example of a checksum suffix added by Algorand.
As a general rule the API should be simple and take care of any parameter misuse in any scheme, the "sealed-box" model has a great value for developers who mainly are not cryptography specialists and can concentrate on the business logic without creating security risks. Some crypto libraries like NaCl/Libsodium have reduced their parameters to enforce their API and avoid traps (i.e. nonce reuse in AES-GCM encryption for example). Being developer friendly with cryptography and security matters is paramount to adoption and success.

@robert-zaremba
Copy link
Collaborator

Let's start investigating all things which will break when changing the address format. My initial list is in the issue below, let's edit that issue (or move it to other place where everyone can edit it): #8041

@robert-zaremba
Copy link
Collaborator

+1 for adapting the Algorand algorithm. I was also proposing it in the ADR-028 review: #7086 (comment) 3 months ago.

In general, with big enough address space and good hash function we shouldn't need post-image prefixes/suffixes. The problem with prefixes / suffixes is that we need to keep track of them and they expand the address space even more (either we create some enums, or verbose codes). It's worth to note that we plan for the future where we will have many complex accounts (composed accounts) with different rules and different composition mechanism. Adding a post-image prefix/suffix for each variation might be cumbersome. WDYT?

@iramiller
Copy link
Contributor

+1 for adapting the Algorand algorithm.

I am a bit confused here, reading through the Algorand specification linked in the referenced comment above they are using an entire 32 byte public key plus a checksum in base 32 to make addresses more resilient for use by users (copy/paste, etc).

Specifically which part of the Algorand specification are you proposing here? It seems there are two contexts involved with regards to the Algorand info... on chain/state store binary formats and those exposed to users, roughly "transit formats". The bech32 address format in use today seems to be sufficient for transit [base 32 + checksum] and the extra checksum specified in the Algorand documentation doesn't seem very useful for internal system functions that are not taking input from a user directly.

@robert-zaremba
Copy link
Collaborator

Specifically which part of the Algorand specification are you proposing here?

We are not going to use it, it was just an idea on how they hash the address.

  • in storage we use address bytes, which we don't have to verify.
  • for presentation layer (everything which goes to user) we use bech32, which already has a checksum integrated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

14 participants