OpenSearch and GraphQL multivalue facets #551

JPrevost · 2022-08-01T13:36:24Z

Why are these changes being introduced:

Our spec requires that many facets can be applied multiple times,
which allows us to filter records to find just those that have
contentTypes of both dataset and still image for example
Source aggregations remain unique as we apply them as OR instead of
AND
literaryForm is the only aggregation that accepts a single value in
the data model

Relevant ticket(s):

https://mitlibraries.atlassian.net/browse/RDI-247

How does this address that need:

Refactors how facets are applied
If multiple values are allowed, calls the new logic multiple times
(once per value)
If a single value is allowed, calls the logic once
Leaves source as custom logic separate from this refactor

Document any side effects to this change:

there was an inconsistency in the aggregation type for format. We
aggregated under contentFormat but accepted queries under format.
I have normalized both sides of that to use format.
Unit tests were created for all facet types, but only one integration
test was created. contentType was chosen as that was the value the
timdex-ui needs to support for RDI. Unit testing all of the fields and
integration testing just one felt okay as it avoids a lot of
additional semi-fragile tests (see below) while ensuring we are
constructing the expected query structure.
a new semi-fragile test was added. Even taking care to not look for
specific values in the data the new test is semi-fragile because we at
least need data that can be filtered twice to test for the expected
data change. As we stabilize the timdex pipelines, we should create a
specific test set of data rather than using what is available. This
would allow us to extract records from each source, while also pulling
some with specific attributes we want to test... including some
potentially constructed records to test values we want to support in
code but can't find appropriate examples. This test set should be
stored in a way that the new data loader can easily load it locally,
but we should also consider having a specific test alias loaded
with this data in dev1 or stage to allow us to access it from PR
builds when appropriate to demonstrate specific problems/solutions to
stakeholders without having to manipulate the "real" aliases/indexes.
OpenSearch#filter was updated to accept a params argument rather than reading from an instance variable to allow for easier unit testing

Developer

All new ENV is documented in README
All new ENV has been added to Heroku Pipeline, Staging and Prod
ANDI or Wave has been run in accordance to
our guide and
all issues introduced by these changes have been resolved or opened as new
issues (link to those issues in the Pull Request details above)
Stakeholder approval has been confirmed (or is not needed)

Code Reviewer

The commit message is clear and follows our guidelines
(not just this pull request message)
There are appropriate tests covering any new functionality
The documentation has been updated or is unnecessary
The changes have been verified
New dependencies are appropriate or there were no changes

Requires database migrations?

NO

Includes new or updated dependencies?

NO

Why are these changes being introduced: * Our spec requires that many facets can be applied multiple times, which allows us to filter records to find just those that have `contentTypes` of _both_ `dataset` and `still image` for example * Source aggregations remain unique as we apply them as `OR` instead of `AND` * `literaryForm` is the only aggregation that accepts a single value in the data model Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/RDI-247 How does this address that need: * Refactors how facets are applied * If multiple values are allowed, calls the new logic multiple times (once per value) * If a single value is allowed, calls the logic once * Leaves `source` as custom logic separate from this refactor Document any side effects to this change: - there was an inconsistency in the aggregation type for `format`. We aggregated under `contentFormat` but accepted queries under `format`. I have normalized both sides of that to use `format`. - Unit tests were created for all facet types, but only one integration test was created. `contentType` was chosen as that was the value the timdex-ui needs to support for RDI. Unit testing all of the fields and integration testing just one felt okay as it avoids a lot of additional semi-fragile tests (see below) while ensuring we are constructing the expected query structure. - a new semi-fragile test was added. Even taking care to not look for specific values in the data the new test is semi-fragile because we at least need data that can be filtered twice to test for the expected data change. As we stabilize the timdex pipelines, we should create a specific test set of data rather than using what is available. This would allow us to extract records from each source, while also pulling some with specific attributes we want to test... including some potentially constructed records to test values we want to support in code but can't find appropriate examples. This test set should be stored in a way that the new data loader can easily load it locally, but we should also consider having a specific `test` alias loaded with this data in dev1 or stage to allow us to access it from PR builds when appropriate to demonstrate specific problems/solutions to stakeholders without having to manipulate the "real" aliases/indexes. - `OpenSearch#filter` was updated to accept a `params` argument rather than reading from an instance variable to allow for easier unit testing

JPrevost · 2022-08-01T14:11:26Z

This example query might be useful to see the change in the PR build.

{
  search(contentTypeFacet: ["dataset", "still image"], index: "rdi*") {
    hits
    aggregations {
      contentType {
        docCount
        key
      }
    }
  }
}

jazairi

Good call on noting the fragile test. I'm open to the test alias idea as a future improvement, but I think that documenting it is fine for now. I'll be curious to see how often that comes up in the near term; it may not be as fragile as we think.

Thanks for this!

mitlib temporarily deployed to timdex-pr-551 August 1, 2022 13:39 Inactive

JPrevost requested a review from jazairi August 1, 2022 14:08

jazairi self-assigned this Aug 1, 2022

jazairi approved these changes Aug 1, 2022

View reviewed changes

JPrevost merged commit 647292a into main Aug 1, 2022

JPrevost deleted the rdi-247-support-for-multivalue-facets branch August 1, 2022 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenSearch and GraphQL multivalue facets #551

OpenSearch and GraphQL multivalue facets #551

JPrevost commented Aug 1, 2022 •

edited by jazairi

Loading

JPrevost commented Aug 1, 2022

jazairi left a comment

OpenSearch and GraphQL multivalue facets #551

OpenSearch and GraphQL multivalue facets #551

Conversation

JPrevost commented Aug 1, 2022 • edited by jazairi Loading

Developer

Code Reviewer

Requires database migrations?

Includes new or updated dependencies?

JPrevost commented Aug 1, 2022

jazairi left a comment

Choose a reason for hiding this comment

JPrevost commented Aug 1, 2022 •

edited by jazairi

Loading