-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide first-class support for SSZ #250
Comments
Before moving in this direction, I'd like to see ethereum/consensus-specs#2983 adopted - the key point here is to maintain a 1:1 mapping between the basic SSZ types (ie no new JSON types for message/fork-specific aliases but rather a finite set of JSON core types), as they exist in The reason why the canonical 1:1 mapping is important is the implementation effort needed to deal with a jungle of REST-specific encodings and inventions which make the overhead of supporting both JSON and SSZ higher than it has to be - with a canonical mapping, it becomes a trivial matter because all things that can be expressed in one format can also, without additional specification, be supported in the other. Mostly, this means constraining JSON to the basic data types in simple_serialize, which shouldn't be a problem (protobuf for example defines such a mapping allowing 100% interop between protobuf and json - we should also). |
Please could we not pollute this issue with the on-going and contentious debate there? The adoption or not of that PR will make no difference to the changes put forward here, as this proposal makes no claim to define either JSON or SSZ encodings. |
This PR represents a significant implementation effort as well as testing surface area - it is highly relevant for its acceptance that we have a future-proof plan for supporting it over time without incurring a high burden on client (well, server) developers - JSON is not going to go away meaning that the advantage of SSZ is mainly to save a few cycles here and there (are there benchmarks?) on JSON vs binary - given how the execution API is still JSON-RPC and hasn't hindered adoption, the performance difference is not a major problem - not now, and not in the foreseeable future - it's a nice-to-have - and thus, it stands to reason that maintenance concerns are addressed first - the good news being that some progress has been made at devcon that has yet to make it to that PR. |
The current call to
This has been known as an issue for a while, with piecemeal adoption of SSZ in some calls as an attempt to alleviate it, but here is an example of unmarshalling the current mainnet state from JSON (199,713,061 bytes) and SSZ (65,731,170 bytes):
The SSZ unmarshalling takes 1% of the time, and around 13% of the memory. Validators, state and blocks are large and getting larger. The additional time taken for validator clients to do their work increases the time taken for blocks to be proposed, which has a knock-on effect across the network. This is exacerbated with the additional time taken with MEV relays. Could this be implemented on a per-call basis rather than across the entire set of responses? Yes it could, but the framework for doing so (bullets 1-3 on the server side) would be necessary to do this anyway. If it's a concern then we can state that SSZ may be available, and that clients should use |
A lot of our response objects are basically from Are we talking about basically if SSZ is returned, that the Certain API's have issues... I'm tempted to constrain scope of this to something like 'validator required' apis, so that it's more manageable... Even more preferred would be specific use cases, rather than an open intent - eg. lets do xxx because it is 90% of the calls we make... Also, depending on how this maps out, potentially we need to fix the blocks api which just returns the A specific example of an endpoint I'm concerned of is the validators list which is
|
my 2 cents: i dont know any clients that read data structures directly from ssz bytes, so since the response payload will probably be decoded in whole, i don't really see the advantage of switching the format. on the topics of validators endpoint:
|
Yes.
If the additional work to provide this for each additional API is linear or near-linear then fair enough, although it would mean that there would be no ability to have an SSZ-only client. But that's not a particular requirement right now. If we were to narrow it down to "endpoints that are sent or return large amounts of data, or large enough amounts of data and are in the validating path" then that would reduce the scope significantly. |
None of the beacon API servers support compression today. But even if they did this would likely increase both memory and CPU to decompress the data. |
The aim of basing this proposal on the canonical JSON mapping is exactly to constrain all our usage of JSON to that which has an direct and unambiguous SSZ equivalent (and vice versa) - at that point, it's a trivial matter to support any API in both JSON and SSZ - both from a specification and from an implementation point of view.
This smells of an unusually bad JSON decoder or a bug in the benchmark - I'd expect something like 3-5x difference for a decent one (even 10x for a poor one .. but 100x?) - ditto memory usage - what is going on in there? |
I'm not a real fan of supporting SSZ for everything. There are cases like returning blocks and states where it's useful to be able to have the response in SSZ format (eg for checkpoint sync or replaying blocks) but I struggle to see why SSZ would be useful for endpoints like Where we do return SSZ it often includes less data because it doesn't include the fields outside the In terms of performance, I could see it potentially being meaningful for large structures like the state but that's already available as SSZ and isn't a performance critical API. Smaller pieces of data are unlikely to show any noticeable benefit of using JSON over SSZ (notably JSON is often used for price feeds in traditional finance which have far tighter performance requirements - there are binary standards for the real extreme end but you can push JSON performance an awfully long way).
I'm unclear why a client would do this explicit querying and it makes two invalid assumptions:
Clients should just send an
Teku explicitly does NOT do this and will return the default representation (JSON) when no For data size, it should be simple to add support for gzip encoding in Teku and in my experience it doesn't add any additional delay - there's a small CPU cost but it wouldn't be noticeable unless you're serving a huge number of concurrent requests (e.g you're running something bigger than Infura). |
I think the strongest case for this is that one can create SSZ-only clients that do not have to implement JSON at all, decreasing their surface and simplifying their code - for non-human use cases, JSON is quite terrible and I can certainly see the allure of avoiding it.
Compression is orthogonal I think - ie even SSZ compresses to about half (based on ERA file compression which compresses SSZ blocks and states using snappy). That said, adding compression to JSON makes the CPU side of performance worse in the sense that it'll take longer to both decompress and parse, most likely - if someone is using a JSON decoder with as poor performance as is suggested in the given benchmark, they will likely not be made happier by having to decompress as well - ditto on the server side: producing compressed JSON obviously takes more juice than non-compressed - the question becomes whether this is offset by the need to stream less data or not, which probably is hard to answer generally (it'll depend on network speed and other factors). I've thought about I agree that performance alone poorly motivates the significant surface that API providers have to implement . |
At current the beacon API enshrines JSON as the only first-class encoding for sending and receiving data. I propose that we provide SSZ as an alternative encoding for all endpoints.
What is SSZ?
Simple Serialize (SSZ) is an encoding used for transferring data across the Ethereum P2P network. It is also used in a number of places for data storage. I won't go in to all of its benefits here, but the ones that we care about most for the purposes of the API are:
How would this work?
Given that the beacon APIs are already present and used there needs to be a way for clients and servers to work together as SSZ support is rolled out over time. The proposal is:
Accept
header with multiple entries and quality values (e.g.Accept: application/octet-stream;q=1.0,application/json;q=0.9
). Note that this appears to be the case for current beacon API server implementations, in that they at least will currently return JSON data when presented with this headerContent-type
header with all data responsesAccept
header can be providedRegarding well-behaved clients:
/eth/v1/node/version
) with the headerAccept: application/octet-stream;q=1.0,application/json;q=0.9
Content-type: application/octet-stream
. Subsequent requests can be sent withAccept: application/octet-stream
(or continue to use the multi-optionAccept
header if desired)Content-type
header, but it is not required to allow support of existing implementations). Subsequent requests can be sent withAccept: application/json
(or continue to use the multi-optionAccept
header if desired)It is possible for a client to only want a single content type by specifying
Accept: application/octet-stream
. A server that supports the '415' error code but does not support SSZ should return the 415 error. A server that does not support the '415' error code will return JSON (again, all current servers appear to do this) so the client should be able to handle this situation.Note that the first three requirements for the servers are very low cost and provide a basis on which to build SSZ support in a clean fashion as far as clients are concerned. They are also all expected behavior for a well-configured REST API server. As such, it would be recommended that these items are implemented separately and ASAP before any work on supporting SSZ even starts.
The text was updated successfully, but these errors were encountered: