Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switching from b58 encoding #2100

Open
ilblackdragon opened this issue Feb 6, 2020 · 20 comments
Open

Switching from b58 encoding #2100

ilblackdragon opened this issue Feb 6, 2020 · 20 comments
Labels
A-RPC Area: rpc C-discussion Category: Discussion, leading to research or enhancement C-housekeeping Category: Refactoring, cleanups, code quality T-public-interfaces Team: issues relevant to the public interfaces team

Comments

@ilblackdragon
Copy link
Member

ilblackdragon commented Feb 6, 2020

@abacabadabacaba suggested to use an improved bs58 scheme that encodes into fixed length representation by splitting sequence into 8 byte chunks.

Additionally to add check sum to verify that the full sequence was provided and wasn't corrupted.

Interesting note that Ethereum didn't have this originally and they added it by using capitalization of letters in base16.

@ilblackdragon
Copy link
Member Author

@frol @evgenykuzyakov just checking in if we want to execute on this before MainNet or will postpone

@frol
Copy link
Collaborator

frol commented Mar 20, 2020

@ilblackdragon Let's start with the problem. I assume the problem is that fixed-length values encode into a non-fixed-length base58 string.

I am against having "clever" custom encodings since it casts away existing tools out there.

Alternatives:

@ilblackdragon ilblackdragon added this to the MainNet milestone Mar 20, 2020
@abacabadabacaba
Copy link
Collaborator

Splitting an input into 8-byte chunks and encoding each of them separately (using fixed-length base58) is a simple approach that solves two major problem of base58: its variable length and the quadratic complexity of its encoding and decoding (which can be optimized, but this is not easy).

This approach still has some issues. For example, encoding requires 64-bit divisions, which are difficult to do in some environments (e.g. JavaScript). I have some ideas about a fixed-length encoding that is almost as efficient (length-wise) as base58 and can be efficiently encoded and decoded without using 64-bit arithmetic.

Another approach is to use some variant of base32 encoding. Base32 encoding is about 17% longer than base58, but it only uses letters of a single case. If we want an encoding that can be easily written down by humans, this is a clear advantage. Some variants, such as Crockford's, are specifically designed to be easy to write and pronounce.

@frol
Copy link
Collaborator

frol commented Mar 20, 2020

I want to highlight that we use base58 only to display hashes, i.e. sha256, that is 256 bits (32 bytes) of data, which encodes into:

  • 44 bytes in base64
  • 43-44 bytes in base58
  • 64 bytes in hex (base16)
  • 56 bytes in base32

@abacabadabacaba
Copy link
Collaborator

We also use base58 for keys, not just for hashes. Keys that we use are also 32 bytes (but I think that we currently encode Ed25519 private keys as 64 bytes, which is a bug).

In base58, it is 32-44 characters. In base32 (without padding), it is 52 characters.

However, this doesn't include a checksum. With 4-byte checksum, it will be:

  • 48 characters in base64.
  • 36-50 characters in base58.
  • 58 characters in base32.
  • 72 characters in base16.

@frol
Copy link
Collaborator

frol commented Mar 20, 2020

Would we consider using base64 for the keys?

Pros:

  • Widely used (SSH keys, PEM (SSL) certificates, ...)
  • Every platform has a package for base64
  • Compact

Cons:

  • Alphabet includes chars +/= which may cause problems with URL encoding. It does not seem to be important to me in the context of keys since you don't want using keys in the URL anyway, do you?

@abacabadabacaba
Copy link
Collaborator

Public keys are used in URLs.

The motivation for base58 was ease of manual entry. If we don't care about that, we can use URL-safe variant of base64.

@frol
Copy link
Collaborator

frol commented Mar 20, 2020

I wonder if anyone ever entered base58 / uuid / base64 value longer than 20 (yet 44+) chars manually... 😃

@ilblackdragon ilblackdragon removed this from the MainNet milestone Mar 24, 2020
@frol frol added C-housekeeping Category: Refactoring, cleanups, code quality A-RPC Area: rpc C-discussion Category: Discussion, leading to research or enhancement labels Mar 25, 2020
@MaksymZavershynskyi
Copy link
Contributor

We should not be creating our own encoding scheme, like the suggested modification of b58. This will create a lot fo confusion and will be a major source of bugs for people trying to integrate their tools with our RPC.

There is no point for optimizing for the number of bytes, since we are using JSON and our RPC responses are hundreds or even thousands of characters long even for simple requests due to all the JSON boilerplate and the names of the fields, see https://docs.nearprotocol.com/docs/interaction/rpc

An example of `broadcast_tx_commit` response using b58, making it 1830 characters long { "receipts_outcome": [ { "block_hash": "8xr1UHftKtth1HErsJ5SEHBgVpbHqyWHQVWeXnT78RfV", "id": "Aei6qE1jXnoXA81WvDk69kc8HfPV8kZTRgte38nHV6j8", "outcome": { "gas_burnt": 2389161082329, "logs": [ "Writing the string [ hello ] to the blockchain ..." ], "receipt_ids": [ "Gus1JVoJgBaccBtzHApDs22k6A5CZ8vEgRWHmEhVfQCs" ], "status": { "SuccessValue": "" } }, "proof": [] }, { "block_hash": "11111111111111111111111111111111", "id": "Gus1JVoJgBaccBtzHApDs22k6A5CZ8vEgRWHmEhVfQCs", "outcome": { "gas_burnt": 0, "logs": [], "receipt_ids": [], "status": "Unknown" }, "proof": [] } ], "status": { "SuccessValue": "" }, "transaction": { "actions": [ { "FunctionCall": { "args": "eyJhcGlSZXNwb25zZSI6ImhlbGxvIn0=", "deposit": "0", "gas": 100000000000000, "method_name": "setResponse" } } ], "hash": "7RnMJsHMVMUbArUdKcMaSBPPomjLzQxzhsuvwP3FXGyi", "nonce": 5, "public_key": "ed25519:4dYj4upxVxeaiQtV1EB5NvUjuK2axytiAaChr6xNCSgp", "receiver_id": "dev-jdvw47f9j", "signature": "ed25519:3UrQcvvjuGSWM2w5sQ5GNZYBVWxRMjFbFp1cPg59Qg1rL4PtWMi5s7HY7cgSdx5PB17aqX1nrjiUiK7xxxtBeu6M", "signer_id": "ajax" }, "transaction_outcome": { "block_hash": "A7NDYejEMJ5RNTdcVuoYDTezruYSh8QKPrVK4TkcUnqy", "id": "7RnMJsHMVMUbArUdKcMaSBPPomjLzQxzhsuvwP3FXGyi", "outcome": { "gas_burnt": 2291572068402, "logs": [], "receipt_ids": [ "Aei6qE1jXnoXA81WvDk69kc8HfPV8kZTRgte38nHV6j8" ], "status": { "SuccessReceiptId": "Aei6qE1jXnoXA81WvDk69kc8HfPV8kZTRgte38nHV6j8" } }, "proof": [] } }
Same thing but using base16, making it 2108 characters long { "receipts_outcome": [ { "block_hash": "76502aa865bf709e56b5856fb2bb082a97d5d187ad98d6dac1d7fc7057320f10", "id": "8f622511e512206b3e3b43904ef015281dc983ef971816ab8bbd70839db4757f", "outcome": { "gas_burnt": 2389161082329, "logs": [ "Writing the string [ hello ] to the blockchain ..." ], "receipt_ids": [ "ec6a0c0a0c895a2b4b592e34bfd4473c358f8e53b05252f281c68b322905eadc" ], "status": { "SuccessValue": "" } }, "proof": [] }, { "block_hash": "0000000000000000000000000000000000000000000000000000000000000000", "id": "ec6a0c0a0c895a2b4b592e34bfd4473c358f8e53b05252f281c68b322905eadc", "outcome": { "gas_burnt": 0, "logs": [], "receipt_ids": [], "status": "Unknown" }, "proof": [] } ], "status": { "SuccessValue": "" }, "transaction": { "actions": [ { "FunctionCall": { "args": "039377254c90bc33963184c3a8167da70438e3", "deposit": "0", "gas": 100000000000000, "method_name": "setResponse" } } ], "hash": "5f7f56aaa51736a77872b9fe291ca320611383d7e8bec68031817e6714e2da45", "nonce": 5, "public_key": "ed25519:35efd78157db6dda96a24988e60de9e7f5c1fefcaa246e87d930d2beda020cb1", "receiver_id": "dev-jdvw47f9j", "signature": "ed25519:7c0d6861845ae00e29fe50ebf449fa2eab2ae5d2b0959b57ded4024ab8ae82db7b455b744c91f293f24ad559d2301857d804bc5a7dff4530e7580c9d1a47d20e", "signer_id": "ajax" }, "transaction_outcome": { "block_hash": "875aa81349e154b01f53314e4ddabd9e920f0a5c36be2b9ffb291538f1da07b4", "id": "5f7f56aaa51736a77872b9fe291ca320611383d7e8bec68031817e6714e2da45", "outcome": { "gas_burnt": 2291572068402, "logs": [], "receipt_ids": [ "8f622511e512206b3e3b43904ef015281dc983ef971816ab8bbd70839db4757f" ], "status": { "SuccessReceiptId": "8f622511e512206b3e3b43904ef015281dc983ef971816ab8bbd70839db4757f" } }, "proof": [] } }

It is a 15% increase in size if we are switching to hex. I suggest we pick hex, which is the most common and readable format (we are not displaying full hashes and keys in our UI anyway) and the difference in bytes is negligible. Anything but hex would be a micro optimization.

@MaksymZavershynskyi
Copy link
Contributor

AFAIU currently our RPC can be trivially abused by submitting large b58 encoded keys. I suggest we add a simple threshold cut-off in RPC that checks that b58-encoded key is <45 characters or returns deserialization error. We then switch to hex post-Mainnet since it is would require lots of changes across our Applayer.

@MaksymZavershynskyi
Copy link
Contributor

Unassigning @evgenykuzyakov since I don't think he is working on it. Also changing the title of the issue to reflect the broader scope of the conversation.

@MaksymZavershynskyi MaksymZavershynskyi changed the title Check sum and improved bs58 encoding Switching from b58 encoding Mar 26, 2020
@abacabadabacaba
Copy link
Collaborator

Now even Bitcoin is moving away from base58. Bitcoin Core 0.20, which was released today, uses Bech32 (a base32-based format) as default address format.

@SkidanovAlex
Copy link
Collaborator

What are the objections to postponing switching to another encoding post Phase I?

Do we have to decode/encode base58 today in any environment in which the quadratic complexity matters (e.g. do we do any b58 decoding in EVM)?

@abacabadabacaba
Copy link
Collaborator

@SkidanovAlex I'm not saying that we need to do it quickly.

@ilblackdragon
Copy link
Member Author

Is this even possible at this point? :D

@stale
Copy link

stale bot commented Jul 1, 2021

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months.
It will be closed in 7 days if no further activity occurs.
Thank you for your contributions.

@stale stale bot added the S-stale label Jul 1, 2021
@bowenwang1996
Copy link
Collaborator

@frol what is the status of this issue?

@stale stale bot removed the S-stale label Jul 1, 2021
@bowenwang1996 bowenwang1996 added S-stale T-public-interfaces Team: issues relevant to the public interfaces team labels Jul 1, 2021
@stale stale bot removed the S-stale label Jul 1, 2021
@frol
Copy link
Collaborator

frol commented Aug 9, 2021

There is no action item here. Feel free to re-open if necessary

@frol frol closed this as completed Aug 9, 2021
@abacabadabacaba
Copy link
Collaborator

I think an action item here is to stop using base58 in any new protocols and APIs, and to deprecate (and eventually remove) any protocols and APIs that are using it.

@stale
Copy link

stale bot commented Dec 2, 2021

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months.
It will be closed in 7 days if no further activity occurs.
Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-RPC Area: rpc C-discussion Category: Discussion, leading to research or enhancement C-housekeeping Category: Refactoring, cleanups, code quality T-public-interfaces Team: issues relevant to the public interfaces team
Projects
None yet
Development

No branches or pull requests

8 participants