-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow genqlient types to be marshaled safely #120
Conversation
When genqlient generates output types, it generates whatever code is necessary to unmarshal them. Conversely, when it generates input types, it generates whatever code is necessary to marshal. This is all that's needed for genqlient itself: it never needs to marshal output types or unmarshal input types. But maybe you do! (For example, to put the responses in a cache, which is the use case that @csilvers hit at Khan, although there are others one can imagine.) While we can't support every serialization format you might want (at least not without adding plugins or some such), it's not unreasonable to expect that since genqlient can read JSON, it can write it too. Sadly, in the past this was not true for types requiring custom unmarshaling logic, for several reasons. In this commit I implement logic to always write both marshalers and unmarshalers whenever they're needed to be able to correctly round-trip the types, even though genqlient doesn't do so. I wasn't starting from scratch, since of course we already write both marshalers and unmarshalers in some cases. But this ended up requiring surprisingly large changes on the marshaling side, mostly to correctly support embedding (which we use for named fragments). Specifically, as the comments in `types.go` discuss, the most difficult issue is spreads with duplicate fields, which translate to Go embedded fields which end up hidden from the json-marshaler. Ultimately, I had to do things quite differently from unmarshaling, and essentially flatten the type when we write marshaler. But in the end it's not so ugly -- indeed arguably it's cleaner! Mainly it's just different. In general, I begin to wonder whether using `encoding/json` at all is really right for genqlient: we're doing a lot of work to appease it, despite knowing what our types look like. I think it would still be a significant increase in lines of code to roll our own, but that code would perhaps be simpler, and would surely be faster (although if we just want the speed gains we could use another JSON-generator library, see also #47). Anyway, something to think about in the future. Test plan: make tesc Reviewers: marksandstrom, steve, adam, miguel, mahtab, jvoll
generate/types.go
Outdated
// Those rules don't work for us. When unmarshaling, we want to fill in all | ||
// the potentially-matching fields (QT.A.Id and QT.B.Id in this case), and when | ||
// marshaling, we want to always marshal exactly one potentially-conflicting | ||
// field; we're happy to use the Go visibility rules when they apply. such field, choosing an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the text got a bit garbled here.
if t == nil || t.IsZero() { | ||
return []byte("null"), nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you add this back in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The t == nil
should be unnecessary, I just put it there for completeness. But t.IsZero()
we do want, to avoid marshaling a time.Time
field to "0001-01-01"
, since genqlient matches encoding/json
in counting zero structs as nonempty for purposes of omitempty
.
I'm probably not the right person to review this, but it all seems plausible to me! It is indeed tricky stuff, but it's commented really well. I do feel like the code would be easier to understand in some ways, actually, if you codegen-ed the json marshal/unmarshal code; you're kinda doing that already for the hard cases. But I agree it would make the size of these generated files explode -- even for the common (?) use-cases where people don't have these fancy embeds and interfaces -- which is a consideration in itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This took me a while to read through, and I think it's great, and we should merge it. I did wonder if we could offload some of your heavy lifting to a library... but other than possibly the one I slacked about, I didn't casually find one that was an obvious candidate.
Yeah, that's my theory. I think there's a lot of stuff to write to handle things like
Yeah, I don't know how much of an issue that will be for folks. It's probably a nonzero perf/binary-size impact, too, in that I don't think the Go compiler/linker can be very smart about pruning methods since you might (and indeed
Yeah, thanks for raising that option. I was thinking about how I feel I am learning more and more places where the stdlib API leaves a lot to be desired, but I didn't think to actually look for what others have done. Sadly, as I wrote on #120, as far as I can tell none of the ones we've found will help a lot with the marshaling side, where embedding is the real challenge, rather than interfaces and custom marshalers. (I mean, both require nontrivial code, but the latter is just copied from unmarshaling, so if you went to the pains to wrap your head around what we do there, it's not so bad.) My feeling is that on the whole it doesn't make sense (at least not just for the code-complexity gains; if we used go-codec the perf benefits might be worth it). But I think in any case since the biggest benefits are on the unmarshaling side, I'd want to do it as a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two-pass marshaling process is an interesting solution! A custom marshaler does seem like it could simplify the code, and that would be a good approach to explore.
If your type implements an interface, we add getter methods for the shared fields, so that those may be accessed via the interface. But it turns out occasionally it's useful to have these getter methods when they don't implement a GraphQL interface, so you can use two genqlient-generated types in the same function if they have the same fields. (This comes up most often when you have a GraphQL union that maybe should really be an interface, or if you don't yet support interfaces implementing other interfaces (indeed our parser doesn't either). But one can imagine other use cases.) We can't predict how you want to do that, so we can't generate the interface, but we can generate the methods, so you can define the interface and do a type assertion from there. Since these methods are pretty simple to generate, we just do it always. (As with #120, if binary size becomes an issue we could later add an option to only generate methods that are truly needed but including them seems like the better default.) This also fixes a subtle and rare bug, which would have become much more common (indeed existing tests caught it). Specifically, if you have a query like ```graphql fragment FragmentOne on T { id } fragment FragmentTwo on T { id } query Q { f { # interface type T ...FragmentOne ...FragmentTwo } } ``` since both `FragmentOne` and `FragmentTwo` request some common field, say `id`, we generate a method `GetId` on each one. But since `FragmentOne` and `FragmentTwo` are both on `T`, we also include their interfaces in the interface we generate for the type of `f`, `QFT`. So `QFT` includes a method `GetId`. But on the implementations, the two methods conflict, and neither gets promoted; this causes various code to fail to compile. With this change, this would have happened much more frequently -- even if only one of the two fragments is on `T`, as long as both request the field. Anyway, we now generate explicit methods on each struct for all of its recursively emebedded fields -- using the logic from #120 to compute them -- so that we don't need to rely on method-promotion. Test plan: make tesc Reviewers: steve, marksandstrom, csilvers, jvoll, adam, miguel, mahtab
## Summary: If your type implements an interface, we add getter methods for the shared fields, so that those may be accessed via the interface. But it turns out occasionally it's useful to have these getter methods when they don't implement a GraphQL interface, so you can use two genqlient-generated types in the same function if they have the same fields. (This comes up most often when you have a GraphQL union that maybe should really be an interface, or if you don't yet support interfaces implementing other interfaces (indeed our parser doesn't either). But one can imagine other use cases.) We can't predict how you want to do that, so we can't generate the interface, but we can generate the methods, so you can define the interface and do a type assertion from there. Since these methods are pretty simple to generate, we just do it always. (As with #120, if binary size becomes an issue we could later add an option to only generate methods that are truly needed but including them seems like the better default.) This also fixes a subtle and rare bug, which would have become much more common (indeed existing tests caught it). Specifically, if you have a query like ```graphql fragment FragmentOne on T { id } fragment FragmentTwo on T { id } query Q { f { # interface type T ...FragmentOne ...FragmentTwo } } ``` since both `FragmentOne` and `FragmentTwo` request some common field, say `id`, we generate a method `GetId` on each one. But since `FragmentOne` and `FragmentTwo` are both on `T`, we also include their interfaces in the interface we generate for the type of `f`, `QFT`. So `QFT` includes a method `GetId`. But on the implementations, the two methods conflict, and neither gets promoted; this causes various code to fail to compile. With this change, this would have happened much more frequently -- even if only one of the two fragments is on `T`, as long as both request the field. Anyway, we now generate explicit methods on each struct for all of its recursively emebedded fields -- using the logic from #120 to compute them -- so that we don't need to rely on method-promotion. ## Test plan: make tesc Author: benjaminjkraft Reviewers: csilvers, dnerdy, aberkan, jvoll, mahtabsabet, MiguelCastillo, StevenACoffman Required Reviewers: Approved By: csilvers, dnerdy Checks: ✅ Test (1.17), ✅ Test (1.16), ✅ Test (1.15), ✅ Test (1.14), ✅ Lint, ✅ Test (1.17), ✅ Test (1.16), ✅ Test (1.15), ✅ Test (1.14), ✅ Lint Pull Request URL: #126
Summary:
When genqlient generates output types, it generates whatever code is
necessary to unmarshal them. Conversely, when it generates input types,
it generates whatever code is necessary to marshal. This is all that's
needed for genqlient itself: it never needs to marshal output types or
unmarshal input types.
But maybe you do! (For example, to put the responses in a cache, which
is the use case that @csilvers hit at Khan, although there are others
one can imagine.) While we can't support every serialization format you
might want (at least not without adding plugins or some such), it's not
unreasonable to expect that since genqlient can read JSON, it can write
it too. Sadly, in the past this was not true for types requiring custom
unmarshaling logic, for several reasons.
In this commit I implement logic to always write both marshalers and
unmarshalers whenever they're needed to be able to correctly round-trip
the types, even though genqlient doesn't do so. I wasn't starting from
scratch, since of course we already write both marshalers and
unmarshalers in some cases. But this ended up requiring surprisingly
large changes on the marshaling side, mostly to correctly support
embedding (which we use for named fragments).
Specifically, as the comments in
types.go
discuss, the most difficultissue is spreads with duplicate fields, which translate to Go embedded
fields which end up hidden from the json-marshaler. Ultimately, I had
to do things quite differently from unmarshaling, and essentially
flatten the type when we write marshaler. But in the end it's not so
ugly -- indeed arguably it's cleaner! Mainly it's just different.
One thing to note is that we do marshal
__typename
based onwhat we know about the types; users need not fill it in (and if they
do we'll ignore it). This seemed to me to be a better UX, and
didn't add much complexity.
In general, I begin to wonder whether using
encoding/json
at all isreally right for genqlient: we're doing a lot of work to appease it,
despite knowing what our types look like. I think it would still be a
significant increase in lines of code to roll our own, but that code
would perhaps be simpler, and would surely be faster (although if we
just want the speed gains we could use another JSON-generator library,
see also #47). Anyway, something to think about in the future.
Test plan:
make tesc