Add BorshSchema #56

MaksymZavershynskyi · 2020-02-21T04:20:50Z

We need Borsh schema for two things:

Rust contracts API;
Node RPC API, e.g. error messages;

#[derive(BorshSchema)] implements the BorshSchema trait that describes the schema of the type and of all its dependencies. See borsh-rs/borsh/tests/test_schema* for examples.

The key differences with the current implementation of Borsh schema in nearlib:

This schema is auto-derivable for most of Rust types;
We now can have arrays of types other than u8;
Works with generics. The name like Mytype<string, u64> is just a string so languages without generics can simply map it to a standard non-generic type;
The schema is language agnostic (enum is a tagged union which is a common language construct that can be emulated in many languages);
Uniform structure of the definitions, no inline definitions like: "fields": [ "field_a", ["u64"]]. Instead we have "fields": [ "field_a", "Array<u64>"].

MaksymZavershynskyi · 2020-02-21T04:25:46Z

FYI @willemneal and @vgrichina you might be interested to familiarize yourself with the format, since it is likely that you will be implementing support for it in nearlib (this would require fixing the nearlib borsh serialization) and AssemblyScript.

MaksymZavershynskyi · 2020-02-21T04:26:15Z

Assigning to @frol since he is our API guardian and will be encountering Borsh schema a lot.

frol · 2020-02-21T14:43:44Z

borsh-rs/borsh/src/schema.rs

+            "Vec<u64>" => r#"{ "kind": "sequence", "params": [ "u64" ] }"#,
+            "Vec<Vec<u64>>" => r#"{ "kind": "sequence", "params": [ "Vec<u64>" ] }"#


This looks inconsistent to me. Why do we encode types with kind (sequence / tuple / enum) on the top-level, and then in params we use Vec<...>?

Kind does not encode the type, it is a type constructor. Meaning knowing the value of the kind and the params we know how to construct this type in memory. Arguably we could leave just four kinds: enum (a.k.a tagged union), sequence, array, tuple (and merge struct with tuple). enum requires reading the byte that tells how many variants are there. sequence requires reading the byte that tell how many elements are there. array does not require reading additional bytes but it has uniform type of element. tuple and struct have non-uniform types. However I decided to keep struct because it allows us to have explicit field names which will be useful when we will be converting from JSON to Borsh.

Relevant, but not exactly the same: https://serde.rs/data-model.html

I am having hard times understanding why we need this top-level type [constructor] be different from the inner items. To achieve consistency, I see two ways:

// 1 "Vec<Vec<u64>>" => r#"{ "kind": "sequence", "params": [ {"kind": "sequence", "params": ["u64"] } ] }"# // 2 "Vec<Vec<u64>>" => r#"Vec<Vec<u64>>"#

I may just miss something, so I will approve the PR (it looks solid from the code and tests point of view), and let you make the decision. I am ready to continue the discussion, though, if you feel that there is a benefit of ELI5.

The idea is that every entry of the map is the following: <name of the type> => <how to construct it>. We cannot have

"Vec<Vec<u64>>" => r#"{ "kind": "sequence", "params": [ {"kind": "sequence", "params": ["u64"] } ] }"#

and should instead have:

"Vec<Vec<u64>>" => r#"{ "kind": "sequence", "params": [ "Vec<u64>" ] }"#, "Vec<u64>" => r#"{ "kind": "sequence", "params": [ "u64" ] }"#

because we can have recursive definitions like:

struct Node { parent: Rc<Node>, children: Vec<Rc<Node>>, }

that cannot be described using the proposed way.

Also we cannot have

"Vec<Vec<u64>>" => r#"Vec<Vec<u64>>"#

for the same reason, and also because processing it would require parsing the string, which is much harder than simply working with JSON.

Another reason is that if we have Vec<A> and Hashmap<A, B> we don't want A to be redefined twice, for redundancy reasons.

Aaaah, I got it! Vec<u64> is essentially an alias, which we use inside Vec<Vec<u64>> definition! Now it makes perfect sense. Looks good to me!

vgrichina · 2020-02-21T18:34:10Z

@nearmax @frol let's make sure we actually have a solution for https://commonwealth.im/near/proposal/discussion/314-dealing-borsh-arguments-on-hsm

MaksymZavershynskyi · 2020-02-21T19:49:18Z

@nearmax @frol let's make sure we actually have a solution for https://commonwealth.im/near/proposal/discussion/314-dealing-borsh-arguments-on-hsm

It seems to be an orthogonal problem that we can solve in parallel.

frol · 2020-02-21T23:52:48Z

@nearmax I am not very deeply involved into the topic @vgrichina brought up, so I may be wrong, but it seems that the problem of parsing JSON is somewhat relevant in terms that even if the transaction is serialized with Borsh, you would still need a JSON parser to parse the types.

vgrichina · 2020-02-22T00:28:29Z

@nearmax it's not an orthogonal problem, because it's really hard to solve the problem discussed in https://commonwealth.im/near/proposal/discussion/314-dealing-borsh-arguments-on-hsm if we go with schemaless Borsh for contract args + separate metadata.

We cannot make any queries to node from hardware wallet to fetch any kind of metadata + we don't have any reason to trust any kind of metadata passed to wallet.

So if we need to support some schemaless format – we basically have to be able to determine if it matches some hardcoded metadata based only on method name (as we don't have any trusted info about contract).

Why are we so concerned about using schema vs having schemaless binary format (a-la messagepack, but can be based on Borsh)?

How much extra gas cost would passing argument names in e.g. fungible token contract take?

How does it compare to minimum gas cost per call?

We need to think well about whole stack when designing this, or we'll run into similar issues as with Rust Serde JSON support (e.g. int 128 serialized as number, but literally no other JSON parser supporting this).

MaksymZavershynskyi · 2020-02-22T01:50:08Z

I am not enabling Borsh as default serializer for contract arguments yet. We need to fix borsh serializer in nearlib and reconcile it with three other JS implementations of borsh anyway, because we need it for node RPC. So most of the work mentioned in here: https://app.zenhub.com/workspaces/chainmiddleware-5cea2bcf78297c385cf0ec81/issues/nearprotocol/nearcore/2167 has to be done independently on whether we use Borsh for args or not. Also exposing metadata for contracts is not predicated on using Borsh for args.

Below I address your comments:

@frol @vgrichina I see only one case where schema-less format creates a problem: HSM, and I have already suggested the solution in the thread.

For cross-contract calls the contracts need to be aware of the arguments structure at the time of their development independently of whether it is JSON or not;
For calling contract from nearlib the plan is to fetch schema from the node and serialize the arguments provided in JSON or TS object or JS objects into borsh format. Nearlib needs to have borsh serializer anyway and needs to work with Borsh schema too for RPC errors and such.

So if we need to support some schemaless format – we basically have to be able to determine if it matches some hardcoded metadata based only on method name (as we don't have any trusted info about contract).

I have described the other solution in the thread that allows to validate the schema based on the transaction result. So metadata is verifiable, the only reason why it is hard on HSM is because it has limited computational power and space, which makes it a corner case.

Why are we so concerned about using schema vs having schemaless binary format (a-la messagepack, but can be based on Borsh)?

Developing serialization format is difficult. We developed Borsh by fixing https://github.com/servo/bincode and tried to keep it as simple as possible. We already know that JSON is problematic because of integers, blobs and other problems. Trying a new format is difficult, we cannot just take some other format and check that it works for everything. Messagepack was not designed to be used for smart contracts, verifying that it works for everything is a large effort that we cannot afford right now.

How much extra gas cost would passing argument names in e.g. fungible token contract take?

I can tell that JSON serialization costs 80% of gas for eth-bridge which made it unfeasible to run. We don't have benchmarks for fun tokens (cc @evgenykuzyakov ) but I am sure it is going to be worse, because 20% of gas in eth-bridge accounted to expensive crypto operations which fun tokens do not have.

We need to think well about whole stack when designing this, or we'll run into similar issues as with Rust Serde JSON support (e.g. int 128 serialized as number, but literally no other JSON parser supporting this).

We did put a lot of though into Borsh. If you think something needs to change in Borsh then let us know.

vgrichina · 2020-02-22T02:18:02Z

@nearmax we don't have to change Borsh or use alternate format like messagepack to go schemaless.

What I suggest is that we build schema-less format using Borsh itself. Like it's should be easy to come by with convention on how to store generic JSON objects serialized as Borsh. That would be introspectable and easy to parse anywhere.

I can tell that JSON serialization costs 80% of gas for eth-bridge which made it unfeasible to run.

This is actually far more special case than singing transaction using hardware wallet. Like most of the contracts don't implement something like bridge and don't have to transfer a lot of data in arguments. Serializing binary data as array of JSON integers is expected to be slow everywhere and isn't good way to use JSON / evaluate performance (like you'd at least do base64 string if you want reasonable performance).

MaksymZavershynskyi · 2020-02-23T20:57:00Z

@nearmax we don't have to change Borsh or use alternate format like messagepack to go schemaless.

What I suggest is that we build schema-less format using Borsh itself. Like it's should be easy to come by with convention on how to store generic JSON objects serialized as Borsh. That would be introspectable and easy to parse anywhere.

@vgrichina Are you suggesting that some Borsh-serialized objects are preceded by their Borsh-serialized schema in binary representation? I think this is a really great idea and the code in this PR can be easily modified to do that.

nearmax added 5 commits February 17, 2020 19:36

WIP Borsh schema

48162ee

Add support for unit struct and unnamed fields

765d18a

Working macro and tests

9f97c6b

Merge schema into Borsh

185387c

Remove old code

bfab2b8

MaksymZavershynskyi requested a review from mikhailOK as a code owner February 21, 2020 04:20

MaksymZavershynskyi requested review from willemneal and vgrichina and removed request for mikhailOK February 21, 2020 04:23

Merge branch 'master' into schema

f114c6e

MaksymZavershynskyi requested review from frol and removed request for willemneal and vgrichina February 21, 2020 04:25

MaksymZavershynskyi added enhancement New feature or request P1 Priority 1 labels Feb 21, 2020

MaksymZavershynskyi mentioned this pull request Feb 21, 2020

Cleanup Borsh Serializers near/nearcore#2167

Closed

4 tasks

frol reviewed Feb 21, 2020

View reviewed changes

MaksymZavershynskyi requested a review from frol February 21, 2020 16:43

MaksymZavershynskyi assigned frol Feb 21, 2020

frol approved these changes Feb 21, 2020

View reviewed changes

Bump version

1c67bfc

MaksymZavershynskyi merged commit 52f7605 into master Feb 22, 2020

MaksymZavershynskyi mentioned this pull request Feb 28, 2020

Switch Borsh schema from JSON to Borsh format #57

Merged

This was referenced Mar 10, 2020

Consolidate and update Borsh for TS/JS near/near-api-js#252

Closed

Upgrade rpctypegen to new Borsh Schema format near/nearcore#2242

Closed

Basic contract metadata proposal near/NEPs#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BorshSchema #56

Add BorshSchema #56

MaksymZavershynskyi commented Feb 21, 2020 •

edited

Loading

MaksymZavershynskyi commented Feb 21, 2020

MaksymZavershynskyi commented Feb 21, 2020

frol Feb 21, 2020

MaksymZavershynskyi Feb 21, 2020

MaksymZavershynskyi Feb 21, 2020

frol Feb 21, 2020

MaksymZavershynskyi Feb 21, 2020

MaksymZavershynskyi Feb 21, 2020 •

edited

Loading

frol Feb 21, 2020

vgrichina commented Feb 21, 2020

MaksymZavershynskyi commented Feb 21, 2020

frol commented Feb 21, 2020

vgrichina commented Feb 22, 2020 •

edited

Loading

MaksymZavershynskyi commented Feb 22, 2020

vgrichina commented Feb 22, 2020

MaksymZavershynskyi commented Feb 23, 2020

		"Vec<u64>" => r#"{ "kind": "sequence", "params": [ "u64" ] }"#,
		"Vec<Vec<u64>>" => r#"{ "kind": "sequence", "params": [ "Vec<u64>" ] }"#

Add BorshSchema #56

Add BorshSchema #56

Conversation

MaksymZavershynskyi commented Feb 21, 2020 • edited Loading

MaksymZavershynskyi commented Feb 21, 2020

MaksymZavershynskyi commented Feb 21, 2020

frol Feb 21, 2020

Choose a reason for hiding this comment

MaksymZavershynskyi Feb 21, 2020

Choose a reason for hiding this comment

MaksymZavershynskyi Feb 21, 2020

Choose a reason for hiding this comment

frol Feb 21, 2020

Choose a reason for hiding this comment

MaksymZavershynskyi Feb 21, 2020

Choose a reason for hiding this comment

MaksymZavershynskyi Feb 21, 2020 • edited Loading

Choose a reason for hiding this comment

frol Feb 21, 2020

Choose a reason for hiding this comment

vgrichina commented Feb 21, 2020

MaksymZavershynskyi commented Feb 21, 2020

frol commented Feb 21, 2020

vgrichina commented Feb 22, 2020 • edited Loading

MaksymZavershynskyi commented Feb 22, 2020

vgrichina commented Feb 22, 2020

MaksymZavershynskyi commented Feb 23, 2020

MaksymZavershynskyi commented Feb 21, 2020 •

edited

Loading

MaksymZavershynskyi Feb 21, 2020 •

edited

Loading

vgrichina commented Feb 22, 2020 •

edited

Loading