Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a "HUGR envelope" binary format #1862

Open
doug-q opened this issue Jan 15, 2025 · 5 comments
Open

Add a "HUGR envelope" binary format #1862

doug-q opened this issue Jan 15, 2025 · 5 comments

Comments

@doug-q
Copy link
Collaborator

doug-q commented Jan 15, 2025

We will at some point be transitioning to serialising via hugr-model by-default.

One of the primary motivations for this change is reducing the size of serialised HUGRs.

To ease the transition, and to mitigate teething problems with hugr-model I suggest we specify and implement a "HUGR envelope" binary format before switching to hugr-model. This envelope would initially be able to contain one of:

  • A HUGR package serialised as json
  • A ztsd-compressed HUGR package serialised as json
  • A HUGR package serialialised as hugr-model
  • A zstd-compressed HUGR package serialised as hugr-model

I suggest the envelope format be:

  • A 64 bit magic number
  • A 64 bit header
  • A variable length byte array payload

The magic number is used by tools to identify a HUGR envelope. It should be randomly generated, and then sanity checked by googling for binary,decimal,hexadecimal representations.

The 64 bit header specifies how the payload should be decoded. Initially it must be one of 4 values: 1, 2, 3, 4. These correspond to the encodings specified above. 0 is invalid because lots of mistakes involve writing 0. One could also include a version here, or leave it for later. We can easily expand the set of allowed headers as time goes on. If we find we need a larger header we should create a new magic number.

Tools are not required to support every encoding, e.g. one imagines zstd support should be behind a cargo feature, and hugr-model currently is behind a cargo feature. Tools should fail gracefully with helpful errors when they see a known-but-unsupported encoding.

As part of this ticket

Questions:

  • HUGR envelope? Any better names.
  • is 64 bit magic number + 64 bit header A good initial setting? It's hard to imagine a large enough number of small enough packages where this would be a problem right?
  • Should we specify a conventional extension? .hugr?
@ss2165
Copy link
Member

ss2165 commented Jan 15, 2025

I am pro starting to use the .hugr extension with this change

@ss2165
Copy link
Member

ss2165 commented Jan 15, 2025

name: "HUGR bitcode" too misleading?

@aborgna-q
Copy link
Collaborator

Look nice!

Would the idea be to keep this around as the standard envelope indefinitely? That's good for future-proofing, but I would be wary of promising retro-compatibility of the format. The json definition is tightly tied to the rust struct definitions, so we will break it at some point (or rather drop it altogether).
It'd be nice to mark the json variant as "experimental" when queried with the HUGR cli, so users are aware of that.

The magic number and 64b header look fine. Check packed_struct for defining it cleanly on the rust side.

name: "HUGR bitcode" too misleading?

"hugr bitcode" would be the serialized hugr-model ?

@doug-q
Copy link
Collaborator Author

doug-q commented Jan 15, 2025

Look nice!

Would the idea be to keep this around as the standard envelope indefinitely? That's good for future-proofing, but I would be wary of promising retro-compatibility of the format. The json definition is tightly tied to the rust struct definitions, so we will break it at some point (or rather drop it altogether). It'd be nice to mark the json variant as "experimental" when queried with the HUGR cli, so users are aware of that.

While it's useful yes. When I say "Tools are not required to support every encoding", this is intended to include that we are allowed to drop json support.

Note that self-versioning serialisations (json + hugr-model) should continue to encode their versions in the payload.

The magic number and 64b header look fine. Check packed_struct for defining it cleanly on the rust side.

name: "HUGR bitcode" too misleading?

"hugr bitcode" would be the serialized hugr-model ?

I think Seyon was suggesting "hugr bitcode" as an alternative to HUGR envelope. I don't love "hugr bitcode", because the payload may be textual (json or maybe hugr-model s-expressions in the future). But I don't feel strongly, happy for others to decide.

@zrho
Copy link
Contributor

zrho commented Jan 15, 2025

Great idea. For the magic number we could use the ASCII of HUGR. I'd split up the header into a type field and a version field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants