Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a different JSON library #47

Open
benjaminjkraft opened this issue May 20, 2021 · 8 comments
Open

Use a different JSON library #47

benjaminjkraft opened this issue May 20, 2021 · 8 comments
Labels
enhancement New feature or request help wanted Issues that anyone could pick up and implement if useful to them
Milestone

Comments

@benjaminjkraft
Copy link
Collaborator

benjaminjkraft commented May 20, 2021

There are a few potential ideas here, which may coordinate or conflict.

First, any code that's using genqlient should be a great match for one of the libraries that does JSON via codegen rather than reflection, which are theoretically way better for perf. @StevenACoffman and @csilvers pointed me to easyjson, ffjson, go-codec, jsonparser, and gojay, and Steve posted some more details about them below; I'd also want to run our own benchmarks on actual genqlient-generated code.

On the other hand, right now we do a lot of work to try to do what we want within the encoding/json API, so it might instead be better to use a library that has a different set of extension points more suitable to our needs. Steve pointed me to mapstructure which is definitely aimed at what we are trying to do in the case of abstract types and custom unmarshalers, which are the most complex case on the decoding side. Sadly it looks like it has somewhat limited support for encoding (see mitchellh/mapstructure#166), and while it has more flexible support for embedding than the standard library, it will only simplify our unmarshalers (by making it easier to merge embedded structs), not eliminate them.

Switching to mapstructure would also preclude switching to one of the other code-gen libraries, which tend to be compatible with encoding/json. Some do add additional options, so we could explore that. (Of those linked above, go-codec seems to have the most extension points, and would likely have similar benefits to mapstructure, although like mapstructure its support for embedding is likely not flexible enough to solve all our problems. It also has support for other encodings, which could be useful for e.g. putting your responses in a cache, or if you use some strange transport for your GraphQL (not likely). Note that it's possible golang/go#5901 will add some of the features we'd like to the standard library, although the timeline on that is very unclear to me.)

One potential future need comes from #30; which for fields will require special handling in (un)marshaling to (un)marshal several layers of JSON structure into/from one layer of Go structure. I didn't see support from any of the libraries for that, although mapstructure might at least provide better tools for us to do it ourselves, or one could imagine any of the libraries might add it (essentially what we want is support for a tag json:"a.b.c"). Another potential future need is more options for how to handle omitempty; right now we match JSON because I think that's the least-bad default (see #103) but it's got some problems for sure. (And now we probably do want to try to keep that behavior, at least as the default, for better or worse, which could be tricky for libraries that have different behavior.)

Finally, as mentioned in #120, another option is to roll our own, making use of what we know about our types to simplify things. I suspect it's not worth it yet, but if our (un)marshalers start having to get even more complicated, it may be; we don't need to implement nearly all of the behavior of the general-purpose libraries since we know what we want our types to look like.

For any of the third-party codegen options, we do need to make sure this won't cause trouble with custom scalars: it might be a little surprising if you have to rerun genqlient because you changed a json tag in a custom scalar, although it's not a huge problem in practice I expect (especially since probably most custom scalars will have UnmarshalJSON methods). In any case we could also put this behind a flag. A first-party implementation could call out to encoding/json for custom scalars, of course.

@benjaminjkraft benjaminjkraft added enhancement New feature or request help wanted Issues that anyone could pick up and implement if useful to them labels Sep 11, 2021
benjaminjkraft added a commit that referenced this issue Sep 29, 2021
When genqlient generates output types, it generates whatever code is
necessary to unmarshal them.  Conversely, when it generates input types,
it generates whatever code is necessary to marshal.  This is all that's
needed for genqlient itself: it never needs to marshal output types or
unmarshal input types.

But maybe you do!  (For example, to put the responses in a cache, which
is the use case that @csilvers hit at Khan, although there are others
one can imagine.)  While we can't support every serialization format you
might want (at least not without adding plugins or some such), it's not
unreasonable to expect that since genqlient can read JSON, it can write
it too.  Sadly, in the past this was not true for types requiring custom
unmarshaling logic, for several reasons.

In this commit I implement logic to always write both marshalers and
unmarshalers whenever they're needed to be able to correctly round-trip
the types, even though genqlient doesn't do so.  I wasn't starting from
scratch, since of course we already write both marshalers and
unmarshalers in some cases.  But this ended up requiring surprisingly
large changes on the marshaling side, mostly to correctly support
embedding (which we use for named fragments).

Specifically, as the comments in `types.go` discuss, the most difficult
issue is spreads with duplicate fields, which translate to Go embedded
fields which end up hidden from the json-marshaler.  Ultimately, I had
to do things quite differently from unmarshaling, and essentially
flatten the type when we write marshaler.  But in the end it's not so
ugly -- indeed arguably it's cleaner!  Mainly it's just different.

In general, I begin to wonder whether using `encoding/json` at all is
really right for genqlient: we're doing a lot of work to appease it,
despite knowing what our types look like.  I think it would still be a
significant increase in lines of code to roll our own, but that code
would perhaps be simpler, and would surely be faster (although if we
just want the speed gains we could use another JSON-generator library,
see also #47).  Anyway, something to think about in the future.

Test plan:
make tesc

Reviewers: marksandstrom, steve, adam, miguel, mahtab, jvoll
@benjaminjkraft
Copy link
Collaborator Author

As mentioned in #120, another option is to roll our own, making use of what we know about our types. I suspect it's not worth it yet, but if our (un)marshalers start having to get even more complicated, it may be, since we end up doing a lot of work to get encoding/json to do what we want.

@csilvers
Copy link
Member

go-codec also supports codegen for json (and other encoding formats as well) I believe, via codecgen.

@StevenACoffman
Copy link
Member

StevenACoffman commented Sep 29, 2021

ugorji/go-codec is worth adding to the list above. It is described here.
Supported Serialization formats are:

I hadn't mentioned it before, because I'm not sure that anything but json is relevant to us, and so it's possible part of ugorji/go popularity is only due to non-relevant formats. However, if it's well maintained, and has the right features, it's worth considering.

library order by page rank number of github stars
go-codec 145 1371
github.com/mailru/easyjson 185 3331
github.com/buger/jsonparser 338 4185
github.com/pquerna/ffjson 1281 2787
github.com/francoispqt/gojay 1335 1999

I like to use popularity as a proxy for package quality, but comparing other things like contribution rate is a little more work.

@benjaminjkraft benjaminjkraft changed the title Use a codegen-based JSON library Use a different JSON library Sep 29, 2021
@benjaminjkraft
Copy link
Collaborator Author

I've updated the issue description to add the wider range of considerations we've added here, and some notes from a quick look at a few of the options.

benjaminjkraft added a commit that referenced this issue Sep 29, 2021
## Summary:
When genqlient generates output types, it generates whatever code is
necessary to unmarshal them.  Conversely, when it generates input types,
it generates whatever code is necessary to marshal.  This is all that's
needed for genqlient itself: it never needs to marshal output types or
unmarshal input types.

But maybe you do!  (For example, to put the responses in a cache, which
is the use case that @csilvers hit at Khan, although there are others
one can imagine.)  While we can't support every serialization format you
might want (at least not without adding plugins or some such), it's not
unreasonable to expect that since genqlient can read JSON, it can write
it too.  Sadly, in the past this was not true for types requiring custom
unmarshaling logic, for several reasons.

In this commit I implement logic to always write both marshalers and
unmarshalers whenever they're needed to be able to correctly round-trip
the types, even though genqlient doesn't do so.  I wasn't starting from
scratch, since of course we already write both marshalers and
unmarshalers in some cases.  But this ended up requiring surprisingly
large changes on the marshaling side, mostly to correctly support
embedding (which we use for named fragments).

Specifically, as the comments in `types.go` discuss, the most difficult
issue is spreads with duplicate fields, which translate to Go embedded
fields which end up hidden from the json-marshaler.  Ultimately, I had
to do things quite differently from unmarshaling, and essentially
flatten the type when we write marshaler.  But in the end it's not so
ugly -- indeed arguably it's cleaner!  Mainly it's just different.

One thing to note is that we do marshal `__typename` based on
what we know about the types; users need not fill it in (and if they
do we'll ignore it).  This seemed to me to be a better UX, and
didn't add much complexity.

In general, I begin to wonder whether using `encoding/json` at all is
really right for genqlient: we're doing a lot of work to appease it,
despite knowing what our types look like.  I think it would still be a
significant increase in lines of code to roll our own, but that code
would perhaps be simpler, and would surely be faster (although if we
just want the speed gains we could use another JSON-generator library,
see also #47).  Anyway, something to think about in the future.

## Test plan:
make tesc


Author: benjaminjkraft

Reviewers: csilvers, StevenACoffman, benjaminjkraft, dnerdy, aberkan, jvoll, mahtabsabet, MiguelCastillo

Required Reviewers: 

Approved By: StevenACoffman, dnerdy

Checks: ✅ Test (1.17), ✅ Test (1.16), ✅ Test (1.15), ✅ Test (1.14), ✅ Lint, ✅ Test (1.17), ✅ Test (1.16), ✅ Test (1.15), ✅ Test (1.14), ✅ Lint

Pull Request URL: #120
@benjaminjkraft benjaminjkraft added this to the v1.0 milestone Aug 15, 2022
@benjaminjkraft
Copy link
Collaborator Author

To do: take a look at how encoding/json/v2 might change the calculus here, and whether we have any suggestions for it.

@x448
Copy link

x448 commented Feb 25, 2024

encoding/json/v2 has many improvements, such as omitempty working like it does in fxamacker/cbor (a CBOR codec that supports RFC 8949 and RFC 8742).

It's unfortunate that encoding/json doesn't support toarray struct tag and JSON doesn't support efficient binary encoding, which are useful for improving speed, memory use, and reducing encoded size.

If using CBOR becomes an option for you then also see Extended Diagnostic Notation (it represents binary CBOR data as JSON with extensions).

DISCLAIMER: I contribute docs & ci to fxamacker/cbor. It's secure, fast, memory efficient, and has deterministic encoding.

@benjaminjkraft
Copy link
Collaborator Author

@x448 it sounds like the library you describe implements some non-JSON encoding, no? We have to use JSON because it's what GraphQL servers use.

@benjaminjkraft
Copy link
Collaborator Author

Quick notes on reading the encoding/json/v2 proposal discussion:

  • if we want to go "roll our own", the new jsontext is for exactly that kind of thing.
  • the change to make json omitempty omit structs that would map to empty JSON objects will be painful; we'll have to decide whether to change behavior as well (and whether to switch to omitzero, support both, etc.).
  • the new [Un]Marshalers is interesting, in principle it might let us avoid generating quite so many marshal methods, although it may also be that in our context it's easier to continue doing so.
  • the benchmarks look to be comparable to the codegen-based libraries, although maybe v2 versions of those will be even faster.
    Anyway, since that's progressing it may be best to just wait until it stabilizes; we can't build on v2 just yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Issues that anyone could pick up and implement if useful to them
Projects
None yet
Development

No branches or pull requests

4 participants