-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switching from b58 encoding #2100
Comments
@frol @evgenykuzyakov just checking in if we want to execute on this before MainNet or will postpone |
@ilblackdragon Let's start with the problem. I assume the problem is that fixed-length values encode into a non-fixed-length base58 string. I am against having "clever" custom encodings since it casts away existing tools out there. Alternatives:
|
Splitting an input into 8-byte chunks and encoding each of them separately (using fixed-length base58) is a simple approach that solves two major problem of base58: its variable length and the quadratic complexity of its encoding and decoding (which can be optimized, but this is not easy). This approach still has some issues. For example, encoding requires 64-bit divisions, which are difficult to do in some environments (e.g. JavaScript). I have some ideas about a fixed-length encoding that is almost as efficient (length-wise) as base58 and can be efficiently encoded and decoded without using 64-bit arithmetic. Another approach is to use some variant of base32 encoding. Base32 encoding is about 17% longer than base58, but it only uses letters of a single case. If we want an encoding that can be easily written down by humans, this is a clear advantage. Some variants, such as Crockford's, are specifically designed to be easy to write and pronounce. |
I want to highlight that we use base58 only to display hashes, i.e. sha256, that is 256 bits (32 bytes) of data, which encodes into:
|
We also use base58 for keys, not just for hashes. Keys that we use are also 32 bytes (but I think that we currently encode Ed25519 private keys as 64 bytes, which is a bug). In base58, it is 32-44 characters. In base32 (without padding), it is 52 characters. However, this doesn't include a checksum. With 4-byte checksum, it will be:
|
Would we consider using base64 for the keys? Pros:
Cons:
|
Public keys are used in URLs. The motivation for base58 was ease of manual entry. If we don't care about that, we can use URL-safe variant of base64. |
I wonder if anyone ever entered base58 / uuid / base64 value longer than 20 (yet 44+) chars manually... 😃 |
We should not be creating our own encoding scheme, like the suggested modification of b58. This will create a lot fo confusion and will be a major source of bugs for people trying to integrate their tools with our RPC. There is no point for optimizing for the number of bytes, since we are using JSON and our RPC responses are hundreds or even thousands of characters long even for simple requests due to all the JSON boilerplate and the names of the fields, see https://docs.nearprotocol.com/docs/interaction/rpc An example of `broadcast_tx_commit` response using b58, making it 1830 characters long
{
"receipts_outcome": [
{
"block_hash": "8xr1UHftKtth1HErsJ5SEHBgVpbHqyWHQVWeXnT78RfV",
"id": "Aei6qE1jXnoXA81WvDk69kc8HfPV8kZTRgte38nHV6j8",
"outcome": {
"gas_burnt": 2389161082329,
"logs": [
"Writing the string [ hello ] to the blockchain ..."
],
"receipt_ids": [
"Gus1JVoJgBaccBtzHApDs22k6A5CZ8vEgRWHmEhVfQCs"
],
"status": {
"SuccessValue": ""
}
},
"proof": []
},
{
"block_hash": "11111111111111111111111111111111",
"id": "Gus1JVoJgBaccBtzHApDs22k6A5CZ8vEgRWHmEhVfQCs",
"outcome": {
"gas_burnt": 0,
"logs": [],
"receipt_ids": [],
"status": "Unknown"
},
"proof": []
}
],
"status": {
"SuccessValue": ""
},
"transaction": {
"actions": [
{
"FunctionCall": {
"args": "eyJhcGlSZXNwb25zZSI6ImhlbGxvIn0=",
"deposit": "0",
"gas": 100000000000000,
"method_name": "setResponse"
}
}
],
"hash": "7RnMJsHMVMUbArUdKcMaSBPPomjLzQxzhsuvwP3FXGyi",
"nonce": 5,
"public_key": "ed25519:4dYj4upxVxeaiQtV1EB5NvUjuK2axytiAaChr6xNCSgp",
"receiver_id": "dev-jdvw47f9j",
"signature": "ed25519:3UrQcvvjuGSWM2w5sQ5GNZYBVWxRMjFbFp1cPg59Qg1rL4PtWMi5s7HY7cgSdx5PB17aqX1nrjiUiK7xxxtBeu6M",
"signer_id": "ajax"
},
"transaction_outcome": {
"block_hash": "A7NDYejEMJ5RNTdcVuoYDTezruYSh8QKPrVK4TkcUnqy",
"id": "7RnMJsHMVMUbArUdKcMaSBPPomjLzQxzhsuvwP3FXGyi",
"outcome": {
"gas_burnt": 2291572068402,
"logs": [],
"receipt_ids": [
"Aei6qE1jXnoXA81WvDk69kc8HfPV8kZTRgte38nHV6j8"
],
"status": {
"SuccessReceiptId": "Aei6qE1jXnoXA81WvDk69kc8HfPV8kZTRgte38nHV6j8"
}
},
"proof": []
}
}
Same thing but using base16, making it 2108 characters long
{
"receipts_outcome": [
{
"block_hash": "76502aa865bf709e56b5856fb2bb082a97d5d187ad98d6dac1d7fc7057320f10",
"id": "8f622511e512206b3e3b43904ef015281dc983ef971816ab8bbd70839db4757f",
"outcome": {
"gas_burnt": 2389161082329,
"logs": [
"Writing the string [ hello ] to the blockchain ..."
],
"receipt_ids": [
"ec6a0c0a0c895a2b4b592e34bfd4473c358f8e53b05252f281c68b322905eadc"
],
"status": {
"SuccessValue": ""
}
},
"proof": []
},
{
"block_hash": "0000000000000000000000000000000000000000000000000000000000000000",
"id": "ec6a0c0a0c895a2b4b592e34bfd4473c358f8e53b05252f281c68b322905eadc",
"outcome": {
"gas_burnt": 0,
"logs": [],
"receipt_ids": [],
"status": "Unknown"
},
"proof": []
}
],
"status": {
"SuccessValue": ""
},
"transaction": {
"actions": [
{
"FunctionCall": {
"args": "039377254c90bc33963184c3a8167da70438e3",
"deposit": "0",
"gas": 100000000000000,
"method_name": "setResponse"
}
}
],
"hash": "5f7f56aaa51736a77872b9fe291ca320611383d7e8bec68031817e6714e2da45",
"nonce": 5,
"public_key": "ed25519:35efd78157db6dda96a24988e60de9e7f5c1fefcaa246e87d930d2beda020cb1",
"receiver_id": "dev-jdvw47f9j",
"signature": "ed25519:7c0d6861845ae00e29fe50ebf449fa2eab2ae5d2b0959b57ded4024ab8ae82db7b455b744c91f293f24ad559d2301857d804bc5a7dff4530e7580c9d1a47d20e",
"signer_id": "ajax"
},
"transaction_outcome": {
"block_hash": "875aa81349e154b01f53314e4ddabd9e920f0a5c36be2b9ffb291538f1da07b4",
"id": "5f7f56aaa51736a77872b9fe291ca320611383d7e8bec68031817e6714e2da45",
"outcome": {
"gas_burnt": 2291572068402,
"logs": [],
"receipt_ids": [
"8f622511e512206b3e3b43904ef015281dc983ef971816ab8bbd70839db4757f"
],
"status": {
"SuccessReceiptId": "8f622511e512206b3e3b43904ef015281dc983ef971816ab8bbd70839db4757f"
}
},
"proof": []
}
}
It is a 15% increase in size if we are switching to hex. I suggest we pick hex, which is the most common and readable format (we are not displaying full hashes and keys in our UI anyway) and the difference in bytes is negligible. Anything but hex would be a micro optimization. |
AFAIU currently our RPC can be trivially abused by submitting large b58 encoded keys. I suggest we add a simple threshold cut-off in RPC that checks that b58-encoded key is <45 characters or returns deserialization error. We then switch to hex post-Mainnet since it is would require lots of changes across our Applayer. |
Unassigning @evgenykuzyakov since I don't think he is working on it. Also changing the title of the issue to reflect the broader scope of the conversation. |
Now even Bitcoin is moving away from base58. Bitcoin Core 0.20, which was released today, uses Bech32 (a base32-based format) as default address format. |
What are the objections to postponing switching to another encoding post Phase I? Do we have to decode/encode base58 today in any environment in which the quadratic complexity matters (e.g. do we do any b58 decoding in EVM)? |
@SkidanovAlex I'm not saying that we need to do it quickly. |
Is this even possible at this point? :D |
This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. |
@frol what is the status of this issue? |
There is no action item here. Feel free to re-open if necessary |
I think an action item here is to stop using base58 in any new protocols and APIs, and to deprecate (and eventually remove) any protocols and APIs that are using it. |
This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. |
@abacabadabacaba suggested to use an improved bs58 scheme that encodes into fixed length representation by splitting sequence into 8 byte chunks.
Additionally to add check sum to verify that the full sequence was provided and wasn't corrupted.
Interesting note that Ethereum didn't have this originally and they added it by using capitalization of letters in base16.
The text was updated successfully, but these errors were encountered: