Skip to content
This repository has been archived by the owner on Oct 26, 2020. It is now read-only.

Faster version of the field selection merging validation #12

Merged

Conversation

acceleratesage
Copy link

@acceleratesage acceleratesage commented Nov 6, 2019

The current OverlappingFieldsCanBeMerged validation has performance problems with a high number of fragments or selection sets. This implements a faster version of OverlappingFieldsCanBeMerged, that can deal with a high number of fragments or selection sets without a blowup in runtime. This PR does not remove the old validation yet but instead adds the new one as an additional option to use.

It also includes JMH for benchmarks.

Fixes: sangria-graphql#296

The algorithm is described in:
https://tech.xing.com/graphql-overlapping-fields-can-be-merged-fast-ea6e92e0a01

Below is an example benchmark graph showing the performance improvement for multiple fragments in the same selection set.
2019-09-16 NoOverlapFrag

The PR has been migrated from the one in the original sangria repository.

To use the new validation rule you have to construct a custom query validator that includes it:

import sangria.validation.rules
import sangria.validation.{QueryValidator, RuleBasedQueryValidator, ValidationRule}
import sangria.execution.Executor

// Define the list of validation rules you want to use and replace
// OverlappingFieldsCanBeMerged with the new implementation.
val rules: List[ValidationRule] = {
  QueryValidator.allRules.collect {
    case _: rules.OverlappingFieldsCanBeMerged => new rules.experimental.OverlappingFieldsCanBeMerged
    case rule                                  => rule
  }
}

// Build a query validator based on the rules
val validator: QueryValidator = new RuleBasedQueryValidator(rules)

// Then pass the custom QueryValidator when executing the query
Executor.execute( ... queryValidator = queryValidator ...)

@acceleratesage
Copy link
Author

The build is only failing for scala 2.11, probably due to a difference in type inference. Is scala 2.11 still required to be supported and shall we update this PR?

@yanns
Copy link

yanns commented Nov 7, 2019

The build can also be flaky. I triggered it again.

@yanns
Copy link

yanns commented Nov 7, 2019

Nope, we have a compilation on scala 2.11.
For the moment, we have not mentioned removing support for scala 2.11.
If you can fix that, it'd a great! ❤️

By the way, great article!
I have not yet token time to review your PR, but I very like having this article to understand your approach.

@paulpdaniels
Copy link

Yes removing 2.11 is a much bigger conversation. We have a large legacy system still on 2.11, so we still need it.

@acceleratesage
Copy link
Author

This compiles on all the supported versions now and is ready for review.

@acceleratesage
Copy link
Author

The latest build error seems to be unrelated to this PR, could you please retrigger the build?

@acceleratesage
Copy link
Author

Unfortunately according to our benchmarks the version with the scala collections is quite a bit slower in the non-asymptotic case than the original one with mutable Java collections. What is your stance regarding this? I would prefer to go back to the Java collections and possibly use anonymous class callbacks instead of lambdas, as scala-2.11 doesn't support lambda SAM syntax yet.

@yanns
Copy link

yanns commented Nov 12, 2019

Unfortunately according to our benchmarks the version with the scala collections is quite a bit slower in the non-asymptotic case than the original one with mutable Java collections. What is your stance regarding this? I would prefer to go back to the Java collections and possibly use anonymous class callbacks instead of lambdas, as scala-2.11 doesn't support lambda SAM syntax yet.

I trust your decision here.

@acceleratesage
Copy link
Author

It seems to be mostly related to LinkedHashMap / LinkedHashSet performance in scala. I see there is also an open issue regarding their implementation here: scala/bug#11369

If I find the time, I will revert / refactor this tomorrow. In the meantime, is there anything else you would need to get this merged?

@yanns
Copy link

yanns commented Nov 12, 2019

If I find the time, I will revert / refactor this tomorrow. In the meantime, is there anything else you would need to get this merged?

time... 😉

Implements a faster version of OverlappingFieldsCanBeMerged,
that can deal with a high number of fragments or selection sets
without a blowup in runtime.

Also includes JMH for benchmarks.

The implementation uses Java collections as:
- The scala LinkedHashSet implementation is slow
- Java util.ArrayList does not preallocate the array

The anonymous class syntax for consumers and functions is
necessary for compatiblity with scala 2.11, which does not
support SAM lambda syntax yet. It can be refactored to use
lambda, when scala 2.11 support is dropped.

Refs:
https://tech.xing.com/graphql-overlapping-fields-can-be-merged-fast-ea6e92e0a01
@acceleratesage acceleratesage force-pushed the fast-overlapping-fields-validation branch from d4fa9b9 to b3691b1 Compare November 13, 2019 13:16
@acceleratesage
Copy link
Author

acceleratesage commented Nov 13, 2019

I would again ask you to restart the build, as it is an ArrayOutOfBounds exception that does not seem related to the code. Regarding the changes, I reverted back to using Java collections and for scala 2.11 compatibility use anonymous classes instead of lambdas in the higher order functions.

Java collections seem currently the better choice for this, as:

  • The scala LinkedHashSet / Map implementation is quite a bit slower than the Java one
  • The java util.ArrayList implementation does not preallocate the array upon construction, while the scala one does, so when no elements are added there is less allocation with the Java version.

Thanks for taking the time to review this rather big PR!

Copy link

@yanns yanns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question: how can people use this new implementation?
Should we find an alternative name to OverlappingFieldsCanBeMergedFast?
Does it differ from OverlappingFieldsCanBeMerged "just" by its speed, and does it have a different behavior?

build.sbt Outdated
classDirectory in Jmh := (classDirectory in Test).value
dependencyClasspath in Jmh := (dependencyClasspath in Test).value
compile in Jmh := (compile in Jmh).dependsOn(compile in Test).value
run in Jmh := (run in Jmh).dependsOn(compile in Jmh).evaluated
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to put the benchmarks into a separated module

And then the JMH plugin could only be activated for the benchmarks.

Ex: https://github.com/circe/circe/blob/6d2e2dddc42c7b38f5ed35527d5caad8b33bff35/build.sbt#L476-L487

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I will have a look at this.


class OverlappingFieldsCanBeMergedFastSpec extends WordSpec with ValidationSupport {

override val defaultRule = Some(new OverlappingFieldsCanBeMergedFast)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question:
if I try this tests with the current implementation (override val defaultRule = Some(new OverlappingFieldsCanBeMerged), I have some errors.
Is it expected?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the rules do implement the same validation, they differ in how they report the errors, so they cannot use the same set of tests. See my answer to your general questions below for more on this.

@acceleratesage
Copy link
Author

Hi Yann, these questions all make a lot of sense. I'll answer the general ones here:

  • The implementation does the same validation: everything that was invalid before should be invalid now and everything that was valid before should be valid now.
  • It differs in how it reports the validation errors. It cannot fully reproduce the level of detail that the original implementation has in its reporting with regard to the origin of the conflicting fields as it looses some of the original nesting/fragment information as a result of the optimization.
  • Coming up with another better name is fine :)
  • I did not replace the original implementation, as I would leave this decision to the maintainers and the community and did not want it to be an issue in getting it accepted. Currently this implementation can be used by configuring a custom set of validation rules and including this instead of the original one. The original one is still the default and I would consider replacing it a separate step.

@yanns
Copy link

yanns commented Nov 14, 2019

Hi Yann, these questions all make a lot of sense. I'll answer the general ones here:

* The implementation does the same validation: everything that was invalid before should be invalid now and everything that was valid before should be valid now.

ok 👍

* It differs in how it reports the validation errors. It cannot fully reproduce the level of detail that the original implementation has in its reporting with regard to the origin of the conflicting fields as it looses some of the original nesting/fragment information as a result of the optimization.

* Coming up with another better name is fine :)

* I did not replace the original implementation, as I would leave this decision to the maintainers and the community and did not want it to be an issue in getting it accepted. Currently this implementation can be used by configuring a custom set of validation rules and including this instead of the original one. The original one is still the default and I would consider replacing it a separate step.

I very appreciate this.
Maybe you could add an example in the description of the PR about how to use it, to ease adoption. Then we could link that in the release notes.

@acceleratesage
Copy link
Author

I added an example of how to use the new validation rule to the PR description.

@acceleratesage
Copy link
Author

@yanns I have changed the sangria project setup to a multi-module build in a separate PR: #14 I think this way we will avoid merge problems, when master changes before this PR gets in, as changing the build setup moves all source files.

@acceleratesage
Copy link
Author

@yanns Thanks for merging the other PR. I plan to update this one to the new structure next week.

@acceleratesage acceleratesage requested a review from yanns November 27, 2019 10:24
@acceleratesage
Copy link
Author

I updated the PR to the new project structure. Please have another look.

Copy link

@yanns yanns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is looking good to me. Thx for all your efforts.
I just have a question about introducing an annotation for api that may change in the future.

*
* The algorithm is described in [[CachedCheck]].
*/
class OverlappingFieldsCanBeMergedFast extends ValidationRule {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your opinion on adding an annotation like akka is doing with@ApiMayChange so that we can rename it in the future?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be fine with that. Is there already an established convention how to signify that in sangria?

Ultimately I hope this can replace the previous rule, so the option to choose explicitly could also disappear at some point.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this, I would probably prefer using packages for this type of information, as an annotation can go unnoticed by the client, while a package like "internal" or "experimental" has to be referenced in the import by a client. So anyone importing something from an "experimental" package states that he knows what he is doing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be fine with that. Is there already an established convention how to signify that in sangria?

I don't think there is an existing convention.
I'd recommend to "copy" what others are doing, like in the akka project, unless someone has a better idea.

Ultimately I hope this can replace the previous rule, so the option to choose explicitly could also disappear at some point.

👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the expectation when using @ApiMayChange? Does it still obey semantic versioning?

Copy link
Author

@acceleratesage acceleratesage Nov 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave the decision here to you and the other maintainers. Maybe there is a way that we could circumvent the problem and come up with a better name? @ApiMayChange is something I am not particularly fond of, due to how it allows invisible exceptions to semantic versioning. A separately versioned contrib module from which code then is migrated to core could be a clearer solution. But to be honest I'd prefer to get this rule in without that overhead.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave the decision here to you and the other maintainers. Maybe there is a way that we could circumvent the problem and come up with a better name? @ApiMayChange is something I am not particularly fond of, due to how it allows invisible exceptions to semantic versioning. A separately versioned contrib module from which code then is migrated to core could be a clearer solution. But to be honest I'd prefer to get this rule in without that overhead.

I'm not the "maintainer", just giving some of my time to help.
I also think a separately versioned contrib module would be cleaner.
But giving the time we all can give, I'd suggest to go with the @ApiMayChange annotation for now so that we can merge this.

- introduce an experimental package and move the new rule there
- rename OverlappingFieldsCanBeMergedFast to OverlappingFieldsCanBeMerged
  as the naming conflict is resolved through the new experimental package
- add an @ApiMayChange annotation
@acceleratesage acceleratesage requested a review from yanns December 3, 2019 10:53
Copy link

@yanns yanns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx a lot!! 🏆

Can you please update the PR comment with the new package name?

@acceleratesage
Copy link
Author

@yanns Done

@yanns yanns merged commit 491ef41 into sangria-graphql-org:master Dec 3, 2019
@yanns
Copy link

yanns commented Dec 3, 2019

What a great work (technically & also explaining with the blog post) 🥇

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize OverlappingFieldsCanBeMerged
3 participants