-
Notifications
You must be signed in to change notification settings - Fork 108
DagCBOR: be more precise about foreign tags #227
Comments
+1 on your suggested approach |
I'd say there should be an error by default, and a "parseLoosely" method as an option. Preserving the extra data without a way to read or to mutate it doesn't make for a pleasant user experience (nor an efficient library -- btw, cbor tags are technically allowed to contain other tags, and if you try to implement this you will be very very sad at how many allocs this can force you to make if you actually support it). Ignoring the extra data means we're not really validating that it's DagCBOR. This is potentially dangerous. (Think: how the comment field in PDFs made it much, much easier to create hash collisions in (already weak) hashes. Even if it's not immediately broken, it's wading out towards the sharks in a very unnecessary way.) Erroring explicitly is the choice that remains standing. We can still also do the "parseLoosely" features in libraries in practice. This would buy us is being able to load CBOR documents that aren't exactly DagCBOR and manipulate them. But I don't think the specs themselves should hedge on this. And users should need to very much opt into such a usage: the number of cases where a user will want this, versus the number of cases where a user would be surprised how much bending-over-backwards the library is doing to support features they didn't need/want, is very leaning towards the latter, I'd bet. |
To connect to other discussions happening elsewhere (no link, I thought it was in a team meeting but perhaps in a private discussion), there's also other parts of CBOR which can be "loose", e.g. the number I think the remaining discussion on this is regarding how much it's worth the effort compared to everything else we're trying to do? It obviously has dedupe benefits but I'm still a little sceptical that the other reasons for doing this raise above the level of "it's just cleaner!". /cc @ribasushi |
@rvagg actually deduplication is not the primary reason for wanting this. It is a part of maintaining the symmetry of "this cryptographic id represents this thing, and this thing is only represented by this cryptographic id" ( theoretical limits notwithstanding ) Yes, it is trivial to add a single padding byte somewhere and change every id. But it should not be possible to express bit-for-bit identical content under multiple cryptographic IDs. The simplest use-case I can think of are takedown requests. Again it comes back to what is actually being built: a product or a platform. If the latter - I strongly believe any low-hanging fruit allowing disambiguation should be baked into the lowest levels of the fabric of said platform. |
deterministic map ordering is in this rough category too, discussed in https://github.com/ipld/team-mgmt/blob/8d612e3ce1222de9dd531ca14bdeeb662bd2a3a1/meeting-notes/2020/2020-01-20--ipld-sync.md#notes, we need to make sure our codecs document rules for these while we're documenting these other "strict" encoding/decoding details. |
Sounds like we should prepare to have a Data Model Spec Sprint Day or something...? I think we're starting to accumulate a decent number of these things and we should get em all on an agenda and knock em out. |
ya, we need another close-a-thon in the specs repo in general, but it would probably be most productive to have at least one session focused on Data Model only. |
@mikeal it's more a "getting-a-lot-done" rather then closing. Most open things are valid things that need some work (in the previous round we were able to close old/out of scope things). |
so, just to give my thoughts on this subject. In my use-case for the IPID DID spec that uses IPLD, it would be very nice to support tag 98 (COSE signing) of CID as tag 42. This provides me with all of the semantics and tooling necessary to accomplish my goal natively in CBOR and IPLD. I'm sure this could also be a method to help with data integrity for DAG sync. Also, @hsanjuan might find this approach interesting for shared clustering.
|
In the DagCBOR spec, be more precise on what happens if you parse CBOR and it contains tags.
I suggest that we just ignore those tags and when you write DagCBOR, the only tag you ever write will be the one for CID (tag 42).
The text was updated successfully, but these errors were encountered: