-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ssz: switch integer encoding to little endian #139
Conversation
the choice between little and big endian is arbitrary from a functional point of view, but practially: * most commodity hardware these days is either little- or biendian * mechanical sympathy between encoding and hardware allows a wider range of tricks to be used when encoding and decoding data leading to better efficiency * we're developing a format that favors "decoding-free" access to data
Pinging @AlexeyAkhunov and @karalabe. Big or little endian? |
The main push-back I heard on this was that eth1.0 uses big-endian so don't introduce a different-endian encoding in eth2.0. The small gain in efficiency is not worth the potential confusion and overhead of having to remember which is which. |
Well, there's a fairly clean break here, considering SSZ vs RLP - it's also an unfortunate fact of life that you have to remember endianness whenever you deal with binary protocols in general. The main performance difference will be that adjacent integers can either be bulk-copied or will have to be byte-flipped one-by-one. It affects both network serialization and hashing. |
Can this be quantified? What is the performance difference?
Would it negatively affect light clients of Ethereum 2.0 built in Ethereum 1.0 contracts? |
It could be that a win in efficiency gained with this optimization is too low comparing with efficiency of other operations, for example, calculating a hash of validators registry. Another thing is that all big number implementations in Java that I've seen uses big-endian to encode/decode numbers to/from byte arrays. So does Milagro, even in C implementation. What about other languages, btw? And in our case it results in reversing signature bytes on each encoding/decoding since signature has |
I'll see if I can pull up some numbers, but we're really not on that stage yet (it's a pretty low-level / final-touch optimization) - the idea itself is mainly taken from other modern serialization formats that state "direct access" as one of their design goals, for example flatbuffers.
yep - though here the machine endianess no longer matters - there's no mechanical sympathy to consider because you can't directly use these numbers anyway, and at this point, it's kind of.. arbitrary. |
anyway, if there's pushback, we can certainly drop this - it's a drop in the sea, as @mkalinin points out (or one of many paper cuts). |
Agree. We may use whichever endiannes for big numbers depending on the case. For instance, BLS12-381#Serialization defines that
As for me, this is not a strong argument. Cause eth2.0 has many differences wrt eth1.0 and that's even great. I am not opposed to little endian. Indeed, it's better to have an optimization opportunity even if doesn't seem too valuable at the moment. Possible solution for big numbers would be in representing them with |
@arnetheduck Is there anything beyond your current 3 issues and 1 PR? I'm keen to address as many issues as possible before we declare the spec a release candidate, so now is a good time to flag things. 👍 |
ah, I hope that it did not came across wrong - it was intended as a general comment and not to say that there are necessarily many in the spec as of now :) I'll go over my notes and see what is still relevant after the latest refactorings (:+1: good work!), and post ASAP! |
It's worth noting |
SSZ and RLP are already vastly different, so I think using big-endian because eth1.0 uses big-endian may not be that of a really strong argument. Besides all the architectures using little-endian, for Parity there's also a really specific reason we would prefer little-endian -- our parity-codec format uses little-endian, and |
Pros of little-endian:
Cons of little-endian:
Little-endian feels on net positive :) |
Consensus reached on the Eth2.0 call. Thanks @djrtwo 🎉 |
As discussed in ACDC ethereum#139.
the choice between little and big endian is arbitrary from a functional
point of view, but practially:
of tricks to be used when encoding and decoding data leading to better
efficiency