Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Primitives Proposal #56

Closed
Stebalien opened this issue Mar 29, 2018 · 8 comments
Closed

Primitives Proposal #56

Stebalien opened this issue Mar 29, 2018 · 8 comments

Comments

@Stebalien
Copy link
Contributor

Stebalien commented Mar 29, 2018

So, there are two modes of operation with respect to IPLD:

  1. Deserialization DWIM: In this mode, we take an IPLD object and try to decode it into some struct definition. This is the easy case as the struct tells us which types are acceptable.
  2. Introspection: In this mode, we traverse through arbitrary IPLD data. This is the case where we actually need a set of primitive types as we need to be able to look at a field and figure out its type with no additional
    information.

IPLD needs a set of primitives supported by all fully-expressive formats. Not all formats need be fully expressive (i.e., JSON can be a special beast). However, when converting to a non-fully-expressive format, data that can't be expressed without loosing type information should be thrown away.

The ones we can all agree on:

  • Map
  • Array
  • Utf8String
  • Bytes
  • Cid
  • Null (undefined and null both map to this)
  • Bool

The big question: What number types to we support?

In Crete, those of us who met up to discuss this agreed on a single magical Decimal type (a superset of all number types we might care about except rationals and irrationals). However, @jbenet (reasonably) objected on the basis that we lose important type information this way. This is especially important for systems that use L0 (a type schema system for IPLD).

Unfortunately:

  1. Users like being able to write "numbers" and have them just work. Without forcing users to define a schema, having multiple number-like types will cause trouble.
  2. CBOR already has some type magic around numbers as the canonical variant dictates that integers must be encoded as small as possible. Therefore, it doesn't really distinguish between, e.g., uint8 and uint32. CBOR really just has an int type.

The only way I can think of properly solving this is by saying that CBOR is not, in fact, a fully-expressive format. Instead, L0 would be the only fully-expressive format and CBOR would also be a subset format (joining the ranks of JSON).

In that case, I'd propose the following number primitives:

  • (u)int{8,16,32,64}
  • float{32,64}
  • BigDecimal (byte string)
  • BigInt (byte string)
  • BigRational (byte string)
  • BigReal (byte string)

Eventually, I'd like to implement these in L1 so we don't have so many.

The L1 language will also have tagged enums but those will be expressible in other formats as:

{
  "/tag": "my tag",
  ... data
}

Note: I'm not happy with this. I'd prefer a minimal set of primitives or a maximal (extensible) set. This set is already missing, e.g., int128 (supported by rust) but, as most languages don't have that, I'd rather not include it. Really, I'd prefer to have generic ints (over the size) but no language that I know of except LLVM bytecode support them.

@Stebalien
Copy link
Contributor Author

Conclusion: block on L0, we need parametric types.

  1. Make (u)int(size) parametric over size.
  2. Support float{16,32,64} (not parametric). Maybe 128?
  3. Support varint (big int). Put all other Big* implementations in the stdlib.
  • Always carry type with deserialized structs.
  • Rely on the linked type object instead of the types defined by, e.g., cbor.
  • Do support sum types (closed and open).

@Stebalien
Copy link
Contributor Author

  • Typed maps, arrays, etc.
  • Any type.

@Stebalien
Copy link
Contributor Author

  • Given L0, Cid should actually be Cid<T> (parametrized over T).

@Stebalien
Copy link
Contributor Author

Current state:

At the research retreat, we discussed this and realized:

  1. We're not going to come up with a solution that's both simple and usable. That is, we're going to need reasonably complex types and don't want to lock ourselves into a set of basic primitives.
  2. Many formats already have a bunch of different types, we can't do anything about that. Furthermore, many of these types are extremely useful.

Our solution was to say:

  1. Codecs can use any type they want.
  2. When we design the type system, we'll come up with a flexible but simple type system and then back-define the existing types in terms of this type system.

What does this mean here? Well, it means that no existing format is fully expressive. Really, this is a logical conclusion that can be drawn from the fact that no existing format can define/understand new types.

@mikeal
Copy link
Contributor

mikeal commented Sep 26, 2018

Is it fair to say that we do require codec's to support JSON Types?

Could we describe "Links" as another type that dag codec's are required to know how to encode?

Would it be possible to come up with a loose definition for a "Binary" type which we can reference in specifications built on IPLD?

@Stebalien
Copy link
Contributor Author

Is it fair to say that we do require codec's to support JSON Types?

Unfortunately, no.

  • Raw obviously doesn't (raw binary).
  • DagPB is highly structured.
  • Git, eth, etc are all in the same boat.

Basically, we want to interop with existing systems so we can't impose requirements like "all systems will support x, y, z".


This issue was really concerned with what we were calling "fully featured" formats. The idea was that some formats would support all primitives and would be able to encode all possible IPLD objects in these formats.

We could (and probably should) introduce a concept of "JSON superset" (or JSON + binary + CIDs) formats.

@mikeal
Copy link
Contributor

mikeal commented Sep 26, 2018

This issue was really concerned with what we were calling "fully featured" formats. The idea was that some formats would support all primitives and would be able to encode all possible IPLD objects in these formats.
We could (and probably should) introduce a concept of "JSON superset" (or JSON + binary + CIDs) formats.

Ok, let me re-frame. For "IPLD Data Model" would we want to have layers along these lines:

    +---------------------------+
    |                           |
    |  Complex Data-Structures  |
    |                           |
    +---------------------------+

    +---------------------------+
    |                           |
L2  |  Complex Types            |
    |                           |
    +---------------------------+

    +---------------------------+
    |                           |
L1  |  Links & Binary           |
    |                           |
    +---------------------------+

    +---------------------------+
    |                           |
L0  |  Simple JSON Types        |
    |                           |
    +---------------------------+
  • Level 0: The types defined by JSON.
  • Level 1: Types for Links and Binary.
  • Level 2: Reserved for a future complex type system.

Complex data-structures (generic hamt node, unixfs-v2) and other data format specs can rely on L0, L1, or L2 data model support. This would allow us to say something along the lines of "This specification requires IPLD Data Model L1" in the unixfs-v2 spec.

The Path spec can define behavior for things that do not conform with the IPLD data model (this is already a source of confusion, as the property names in our git codec are not identical to their internal representation).

The CID spec does not require the "IPLD Data Model." It's the other way around, IPLD L1 Data Model support depends on the CID spec for the Link type.

Does this make sense? I worry that I'm re-using a few terms we've previously defined like "IPLD Data Model."

@rvagg
Copy link
Member

rvagg commented Aug 14, 2019

Closing due to staleness as per team agreement to clean up the issue tracker a bit (ipld/team-mgmt#28). This doesn't mean this issue is off the table entirely, it's just not on the current active stack but may be revisited in the near future. If you feel there is something pertinent here, please speak up, reopen, or open a new issue. [/boilerplate]

@rvagg rvagg closed this as completed Aug 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants