Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate graphql-crunch #303

Closed
kachkaev opened this issue May 3, 2018 · 26 comments
Closed

Integrate graphql-crunch #303

kachkaev opened this issue May 3, 2018 · 26 comments

Comments

@kachkaev
Copy link
Contributor

kachkaev commented May 3, 2018

Just came across this tweet and could not resist from sharing:

screen shot 2018-05-03 at 07 29 45

I haven't tested graphql-crunch myself, but it appears to be a great tool doing a similar job to graphql-deduplicator, which is a part of graphql-yoga already.

Adding graphql-crunch to Apollo Server and Apollo Client is very similar to how this is done for graphql-deduplicator, according to the README. It can also become graphql-yoga's built-in feature that gets triggered by a flag and thus save gigabytes of bandwidth worldwide 🌍

WDYT of graphql-crunch? I'll be happy to submit a PR that adds it if there is interest 🙂

@jamesreggio
Copy link

jamesreggio commented May 3, 2018

Hi @kachkaev — thanks for your interest. Allow me to shed a little more light on graphql-crunch.

We actually considered using graphql-deduplicator before building graphql-crunch. However, the way that we use our API immediately uncovered a major problem with graphql-deduplicator. (I mean no disrespect in this explanation, since the existence of graphql-deduplicator inspired us to pursue this problem.)

graphql-deduplicator depends upon the identity-mapping/cache normalization behavior built into the Apollo InMemoryCache. (Relay has similar behavior, though I'm less familiar with it.) It assumes that objects with the same (__typename, id) pair are identical, and that only one needs to be included in the response. Thus, it prunes the extra copies, leaving only the (__typename, id) fields in the duplicates, which the cache then automatically links back to the fully-hydrated, canonical version.

We had two problems with this behavior:

  1. It takes a dependency upon configurable behavior within Apollo's InMemoryCache, which is fundamentally fragile, and doesn't generalize to other spec-compliant cache implementations, like apollo-cache-hermes (which we use in our product).

  2. It doesn't work properly for all cases, even when using InMemoryCache. graphql-deduplicator will only preserve the first object it encounters in the response payload for a given (__typename, id) pair, which means that if the same (__typename, id) pair occurs multiple times in your payload but has a different set of fields, you will lose the fields that were not present in the first copy encountered. (If my explanation isn't clear, let me know and I'll concoct a repro case.)

To fix graphql-deduplicator would've required a complete rewrite, so we built graphql-crunch.

graphql-crunch is a completely lossless, very simple JSON compression algorithm. In fact, there's nothing about it that's specific to GraphQL — you could just as easily use it to compress and inflate the responses in a REST API response. This means that whatever you roundtrip through graphql-crunch's crunch and uncrunch methods will be deeply-equal.

We figured that maintaining this invariant would be the safest way to provide a reliable tool, which still accomplishes the same level of compression.

In the tweet thread from above, some people raised concerns about how we benchmarked response sizes (uncompressed vs gzip), and whether we benchmarked the uncrunch time. The short answer is: yes, we benchmarked with gzip enabled, and we also included the uncrunch time in our timing benchmarks. graphql-crunch still came out ahead, especially for our app, which has a lot of nested, duplicate 'user account' objects throughout most of its payloads.

The graphql-crunch README shows that even with gzip compression enabled, you can still save a huge amount of bandwidth on certain responses. Take a look at the table at the end of that section.

The uncrunch algorithm is a linear time algorithm that uses the same reference for duplicated objects. It saves quite a bit of time and garbage by not copying the duplicated objects — a behavior which is totally fine for all GraphQL client implementations, since never mutate the response (and, in fact, typically go on to Object.freeze it in development mode).

We focused our benchmarking on React Native clients on iOS and Android. React Native runs on JavaScriptCore, which can be surprisingly slow for JSON.parse. It's much faster to parse a payload that is 75% smaller and then run the uncrunch algorithm than it is to just JSON.parse the full, original payload. (And this is to say nothing of the network savings.) I don't have the specific numbers at hand for the parsing benchmarks, but we'll try to get a blog post together that includes all of this information.

(In fact, we even microbenchmarked the performance of pre-allocating a fixed size array for this line of code. I tweeted out the benchmark here.)

@stevekrenzel and I think what you've built with graphql-yoga is great. The GraphQL tooling ecosystem can feel fairly intimidating to a newcomer, and we're glad to see 'batteries-included' bundles like this. I hope this info helps increase your confidence about including our library in your toolkit.

@kachkaev
Copy link
Contributor Author

kachkaev commented May 3, 2018

Thanks for your comment @jamesreggio. I’m totally with you about the performance gains and other benefits of your library and would love it to be a part of graphql-yoga. The only concern I have about graphql-crunch is that its name is not crunch :–D Just joking, I totally understand that this is justified by the problem you had originally ;–)

I read your source code and was shocked with how small, simple and smart the library is!

@lfades
Copy link
Contributor

lfades commented May 3, 2018

Interesting, @kachkaev the implementation of this would be exactly equal to the one I did for graphql-deduplicator (but using a different header), types for graphql-crunch will also be required. @jamesreggio would you like to have types in your package or in definitelyTyped ?

@kachkaev
Copy link
Contributor Author

kachkaev commented May 3, 2018

It’d be great if TypeScript types we stored directly in graphql-crunch to avoid a need to maintain @types/graphql-crunch, which is more likely to go out of sync. Doing this is just about adding one really small index.d.ts file to the package – that’s no big harm or effort.

@lfades WDYT of supporting ?deduplicate and ?crunch GET parameters in addition to HTTP headers? That looks like a more accessible way for simple testing scenarios. That’s what READMEs for both libraries recommend if I remember correctly (on mobile phone now).

@jamesreggio
Copy link

We can definitely add type definitions. You can track that here: banterfm/graphql-crunch#2

And yes, I definitely recommend the headers and query-param approach.

@lfades
Copy link
Contributor

lfades commented May 3, 2018

I used a header cause I found it more reasonable after reading this issue: gajus/graphql-deduplicator#5, I don't have any real preference there.

@jamesreggio I would like to add the types if you don't mind

@kachkaev
Copy link
Contributor Author

kachkaev commented May 3, 2018

I believe that having both GET and headers will be a useful thing. This will work for various scenarios and won’t hurt anyone :–)

@schickling
Copy link
Contributor

This sounds very interesting! Could you summarize the pros & cons for replacing graphql-deduplicator with graphql-crunch for Yoga?

@kachkaev
Copy link
Contributor Author

kachkaev commented May 10, 2018

@schickling I don't think there is a need to choose between the two, at least for backwards-compatibility reasons. Both compression libraries are pretty small and can be painlessly included in parallel like so:

  1. If a request has ?crunch or X-GraphQL-Crunch, use graphql-crunch.
  2. If a request has ?deduplicate or X-GraphQL-Deduplicate, use graphql-deduplicate.
  3. If both, throw an exception.
  4. If none, return the payload as is.

I'd wait a week or two for a reply in banterfm/graphql-crunch#1 and can submit a PR to graphql-yoga afterwards. It'd be great to see if the guys are willing to compress keys before enabling graphql-crunch to avoid breaking changes or a need for something like ?crunch-with-keys in Yoga.

@orefalo
Copy link

orefalo commented May 10, 2018

Maybe bundling the two capabilities is a bad idea from the get go, think about express.js and how elegant and thin it is.

These customizations should be part of the examples, the core should only provide hooks to implement them.

@jamesreggio
Copy link

@schickling — I explained the differences between graphql-crunch and graphql-deduplicator in my comment here. It's primarily a matter of correctness: graphql-crunch is a simple, lossless JSON compressor, whereas graphql-deduplicator can result in data loss due to its naive first-object-wins approach to deduplication.

@orefalo — the whole idea around graphql-yoga is to be a 'batteries included' framework that doesn't require npm installing a dozen packages to get started. If you want to avoid the very minor bloat of including both of these libraries, you can obtain total control by 'ejecting'.

@jamesreggio
Copy link

@kachkaev — may I suggest dropping the X- prefix from the headers. The IETF deprecated that practice a couple years ago.

@kachkaev
Copy link
Contributor Author

@jamesreggio I don't have a strong opinion here, I'm just suggesting X-GraphQL-Crunch because X-GraphQL-Deduplicate is already included. I think I'll go for ?crunch in my projects, but I'm happy to add support for other query strings or headers if that's convenient to someone else. Checking against a couple of options is super-cheap anyway and I also don't see how this can be abused¹.

As of now, I still guess it's a right thing to support X-GraphQL-Crunch, just for the symmetry with X-GraphQL-Duplicate that we can't remove.


¹ Of course, if we don't count that someone would want to DDOS a server a bit more efficiently by turning compression on 😅 In any case, we'll have a feature flag in options to disable graphql-crunch, similar to deduplicator: false.

@orefalo
Copy link

orefalo commented May 10, 2018

Just thinking forward, what about adding capabilities around

  • query complexity
  • additional scalar types
  • validation directives
  • formatting directives
  • mocking

full stack vs lean frameworks - won't argue, its a matter of opinions. ;-/

@kachkaev
Copy link
Contributor Author

kachkaev commented May 10, 2018

@orefalo feel free to suggest any of those as a PR! I agree with @jamesreggio that graphql-yoga is a framework with all batteries included – the more commonly used features the better.

Have a look at create-react-app that has inspired this project: they started small about two years ago, but now they support nearly any perk that a front-end developer would ever need! Despite all that, CRA is the best choice for writing Hello world in React and it's pretty fast. graphql-yoga serves a similar purpose, it's just focused on ‘GraphQL server’ rather than ‘React client’ at its core.

@lfades
Copy link
Contributor

lfades commented May 10, 2018

@jamesreggio I didn't know anything about that deprecation, maybe we should deprecate that option here and only use a query param 🤔.

@kachkaev I added some tests for graphql-deduplicator in this PR. The tests for graphql-crunch will be very much the same 💃

@kachkaev
Copy link
Contributor Author

kachkaev commented May 10, 2018

@lfades we can deprecate X-GraphQL-Deduplicate by simply hiding it from the docs, but we still have to support it at least until [email protected] to avoid breaking changes. If ?crunch is the only trigger for graphql-crunch middleware, I'm totally happy with this. What I'd like though is to see some ’symmetry’ between crunch and deduplicator when it comes to docs and graphql-yoga options.

@jamesreggio
Copy link

Gotcha — I didn't realize that X-GraphQL-Deduplicate is already supported. With that in mind, I think symmetry is a greater virtue (because it reduces the amount of developer surprise), and would be fine with X-GraphQL-Crunch.

My colleague is working on the per-key crunching which will be a breaking change (hopefully the one and only), so I definitely recommend waiting until it's merged. Should be very soon.

@kachkaev
Copy link
Contributor Author

kachkaev commented May 10, 2018

@jamesreggio speaking of breaking changes, WDYT of renaming graphql-crunch into something like data-crunch (or whatever else available) to emphasise that it's not GraphQL-specific and works for any JSON? Seeing GraphQL in the package name gave me a slightly misleading initial impression about what your module does, but digging a bit deeper was a real opening!

@jamesreggio
Copy link

@kachkaev — hah. If we did that, we'd spin out a different repo and have graphql-crunch depend upon it. There's a lot of value in having the name simultaneously communicate value prop (compression) and a concrete use case (for GraphQL).

In fact, I'm sure there are plenty of other lossless JSON compression libraries out there that none of us have considered because we wouldn't want to do the legwork to verify that they produce consistently superior results within our domain of concern (GraphQL payloads).

@kachkaev
Copy link
Contributor Author

Hey @jamesreggio! Could you share your plans on refactoring graphql-crunch? Are you still planning this? I mean banterfm/graphql-crunch#1

@abelovic
Copy link

@jamesreggio - not to discount anything you have done (because it's very cool) but is there a reason that you did not choose something like Apache Avro? In the big data world this is already a standard protocol for this: https://en.wikipedia.org/wiki/Apache_Avro

Here is a nodejs implementation of the spec: https://github.com/mtth/avsc

It does need a schema but we should be able to get that from our info object in graphql. This might be a really good fit or at least worth investigating. I have been watching this thread and its progress and just wanted to put in my 2 cents :)

@jamesreggio
Copy link

jamesreggio commented Aug 10, 2018

Hi @abelovic — yes, we didn't use other serialization solutions because they introduce a dramatic complexity overhead that we didn't want or need. Our goal was to create a drop-in, zero-configuration, application-level compression library that is narrowly tailored to the domain.

The library that we built satisfies these requirements. It's trivial to adopt, its source takes fewer than three minutes to read, and it benchmarks extremely well (above and beyond the gains from gzipping) for GraphQL responses that have significant subtree overlap.

Its implementation even has a beneficial side-effect wherein equivalent subtrees are referentially equal at deserialization time, which certain GraphQL clients (e.g., apollo-cache-hermes) are able to utilize to avoid repeating work.

I'm sure it's possible to map GraphQL schemas to interface definitions in a number of binary exchange formats, but I'm not convinced you'll see meaningful gains beyond what graphql-crunch + gzipping provides. (In fact, I'm not convinced that Avro would perform well with massive duplication. I believe it would still include repetitive entries, but just in a more compact form.) Furthermore, you'd have to consider the impacts of these formats on forward compatibility (i.e., if you add a field to a type, will older apps — which don't query for nor know about said field — still be able to deserialize the response)?

@kachkaev, we haven't had time to focus on deduplicating keys. Short spans of consecutive bytes (for example, JSON object keys) are exactly what gzip and other Huffman-encoded algorithms excel at compressing, so I suspect the marginal gains would be minimal. @stevekrenzel can chime in, though, because I think he implemented a strawman and saw some benchmark regressions.

@stale
Copy link

stale bot commented Nov 23, 2018

Due to inactivity of this issue we have marked it stale. It will be closed if no further activity occurs.

@stale stale bot added the stale label Nov 23, 2018
@stale
Copy link

stale bot commented Nov 30, 2018

Hey 👋, It seems like this issue has been inactive for some time. In need for maintaining clear overview of the issues concerning the latest version of graphql-yoga we'll close it.
Feel free to reopen it at any time if you believe we should futher discuss its content. 🙂

@Urigo
Copy link
Collaborator

Urigo commented Mar 29, 2022

Hey, @Urigo from The Guild here!

You might know us from projects such as graphql-code-generator, envelop or graphql-tools.

For a long time we thought that the Javascript ecosystem is still missing a lightweight cross-platform, but still highly customizable GraphQL Server.

In the past the awesome Prisma team took on that great challenge and now we are happy to announce that we are continuing them and just released GraphQL Yoga 2.0 - Build fast, extensible, and batteries-included (Subscriptions, Serverless, File uploads support) GraphQL APIs in Node.js 🚀

We have been working a long time on version 2.0 and have been using it in our clients projects for a few months now and shared a couple of alpha cycles here.
Thank you all for your feedback and suggestions, you made this release possible!

Please try Yoga out again, give us feedback and help us spread the word on the new release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants