Primitives Proposal #56

Stebalien · 2018-03-29T16:12:34Z

So, there are two modes of operation with respect to IPLD:

Deserialization DWIM: In this mode, we take an IPLD object and try to decode it into some struct definition. This is the easy case as the struct tells us which types are acceptable.
Introspection: In this mode, we traverse through arbitrary IPLD data. This is the case where we actually need a set of primitive types as we need to be able to look at a field and figure out its type with no additional
information.

IPLD needs a set of primitives supported by all fully-expressive formats. Not all formats need be fully expressive (i.e., JSON can be a special beast). However, when converting to a non-fully-expressive format, data that can't be expressed without loosing type information should be thrown away.

The ones we can all agree on:

Map
Array
Utf8String
Bytes
Cid
Null (undefined and null both map to this)
Bool

The big question: What number types to we support?

In Crete, those of us who met up to discuss this agreed on a single magical Decimal type (a superset of all number types we might care about except rationals and irrationals). However, @jbenet (reasonably) objected on the basis that we lose important type information this way. This is especially important for systems that use L0 (a type schema system for IPLD).

Unfortunately:

Users like being able to write "numbers" and have them just work. Without forcing users to define a schema, having multiple number-like types will cause trouble.
CBOR already has some type magic around numbers as the canonical variant dictates that integers must be encoded as small as possible. Therefore, it doesn't really distinguish between, e.g., uint8 and uint32. CBOR really just has an int type.

The only way I can think of properly solving this is by saying that CBOR is not, in fact, a fully-expressive format. Instead, L0 would be the only fully-expressive format and CBOR would also be a subset format (joining the ranks of JSON).

In that case, I'd propose the following number primitives:

(u)int{8,16,32,64}
float{32,64}
BigDecimal (byte string)
BigInt (byte string)
BigRational (byte string)
BigReal (byte string)

Eventually, I'd like to implement these in L1 so we don't have so many.

The L1 language will also have tagged enums but those will be expressible in other formats as:

{
  "/tag": "my tag",
  ... data
}

Note: I'm not happy with this. I'd prefer a minimal set of primitives or a maximal (extensible) set. This set is already missing, e.g., int128 (supported by rust) but, as most languages don't have that, I'd rather not include it. Really, I'd prefer to have generic ints (over the size) but no language that I know of except LLVM bytecode support them.

The text was updated successfully, but these errors were encountered:

Stebalien · 2018-04-05T20:41:19Z

Conclusion: block on L0, we need parametric types.

Make (u)int(size) parametric over size.
Support float{16,32,64} (not parametric). Maybe 128?
Support varint (big int). Put all other Big* implementations in the stdlib.

Always carry type with deserialized structs.
Rely on the linked type object instead of the types defined by, e.g., cbor.
Do support sum types (closed and open).

Stebalien · 2018-04-05T21:12:06Z

Typed maps, arrays, etc.
Any type.

Stebalien · 2018-04-05T21:14:26Z

Given L0, Cid should actually be Cid<T> (parametrized over T).

Stebalien · 2018-09-26T02:36:30Z

Current state:

At the research retreat, we discussed this and realized:

We're not going to come up with a solution that's both simple and usable. That is, we're going to need reasonably complex types and don't want to lock ourselves into a set of basic primitives.
Many formats already have a bunch of different types, we can't do anything about that. Furthermore, many of these types are extremely useful.

Our solution was to say:

Codecs can use any type they want.
When we design the type system, we'll come up with a flexible but simple type system and then back-define the existing types in terms of this type system.

What does this mean here? Well, it means that no existing format is fully expressive. Really, this is a logical conclusion that can be drawn from the fact that no existing format can define/understand new types.

mikeal · 2018-09-26T15:57:12Z

Is it fair to say that we do require codec's to support JSON Types?

Could we describe "Links" as another type that dag codec's are required to know how to encode?

Would it be possible to come up with a loose definition for a "Binary" type which we can reference in specifications built on IPLD?

Stebalien · 2018-09-26T16:19:05Z

Is it fair to say that we do require codec's to support JSON Types?

Unfortunately, no.

Raw obviously doesn't (raw binary).
DagPB is highly structured.
Git, eth, etc are all in the same boat.

Basically, we want to interop with existing systems so we can't impose requirements like "all systems will support x, y, z".

This issue was really concerned with what we were calling "fully featured" formats. The idea was that some formats would support all primitives and would be able to encode all possible IPLD objects in these formats.

We could (and probably should) introduce a concept of "JSON superset" (or JSON + binary + CIDs) formats.

mikeal · 2018-09-26T17:33:41Z

This issue was really concerned with what we were calling "fully featured" formats. The idea was that some formats would support all primitives and would be able to encode all possible IPLD objects in these formats.
We could (and probably should) introduce a concept of "JSON superset" (or JSON + binary + CIDs) formats.

Ok, let me re-frame. For "IPLD Data Model" would we want to have layers along these lines:

    +---------------------------+
    |                           |
    |  Complex Data-Structures  |
    |                           |
    +---------------------------+

    +---------------------------+
    |                           |
L2  |  Complex Types            |
    |                           |
    +---------------------------+

    +---------------------------+
    |                           |
L1  |  Links & Binary           |
    |                           |
    +---------------------------+

    +---------------------------+
    |                           |
L0  |  Simple JSON Types        |
    |                           |
    +---------------------------+

Level 0: The types defined by JSON.
Level 1: Types for Links and Binary.
Level 2: Reserved for a future complex type system.

Complex data-structures (generic hamt node, unixfs-v2) and other data format specs can rely on L0, L1, or L2 data model support. This would allow us to say something along the lines of "This specification requires IPLD Data Model L1" in the unixfs-v2 spec.

The Path spec can define behavior for things that do not conform with the IPLD data model (this is already a source of confusion, as the property names in our git codec are not identical to their internal representation).

The CID spec does not require the "IPLD Data Model." It's the other way around, IPLD L1 Data Model support depends on the CID spec for the Link type.

Does this make sense? I worry that I'm re-using a few terms we've previously defined like "IPLD Data Model."

rvagg · 2019-08-14T08:28:52Z

Closing due to staleness as per team agreement to clean up the issue tracker a bit (ipld/team-mgmt#28). This doesn't mean this issue is off the table entirely, it's just not on the current active stack but may be revisited in the near future. If you feel there is something pertinent here, please speak up, reopen, or open a new issue. [/boilerplate]

Stebalien mentioned this issue Sep 26, 2018

RFC: Link encoding in IPLD #70

Closed

rvagg closed this as completed Aug 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Primitives Proposal #56

Primitives Proposal #56

Stebalien commented Mar 29, 2018 •

edited

Loading

Stebalien commented Apr 5, 2018

Stebalien commented Apr 5, 2018

Stebalien commented Apr 5, 2018

Stebalien commented Sep 26, 2018

mikeal commented Sep 26, 2018

Stebalien commented Sep 26, 2018

mikeal commented Sep 26, 2018 •

edited

Loading

rvagg commented Aug 14, 2019

Primitives Proposal #56

Primitives Proposal #56

Comments

Stebalien commented Mar 29, 2018 • edited Loading

Stebalien commented Apr 5, 2018

Stebalien commented Apr 5, 2018

Stebalien commented Apr 5, 2018

Stebalien commented Sep 26, 2018

mikeal commented Sep 26, 2018

Stebalien commented Sep 26, 2018

mikeal commented Sep 26, 2018 • edited Loading

rvagg commented Aug 14, 2019

Stebalien commented Mar 29, 2018 •

edited

Loading

mikeal commented Sep 26, 2018 •

edited

Loading