-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds support for querying specific indexes #540
Conversation
Why are these changes being introduced: * We need to be able to provide a UI that only queries a specific set of data, so we need to enable that in the API Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/RDI-208 How does this address that need: * Adds `index` to both Search and Retrieve GraphQL endpoints and documents them as being best left out * Updates Opensearch and Retrieve models to use any index value provided and default to the value provided in ENV if none is provided Document any side effects to this change: * The Retrieve model had previously been the same for v1 and v2 but this change required a v2 feature flag for Retrieve to ensure the current behavior continued to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave the code review part to Matt, but the query functionality makes sense to me and I've (re)confirmed that it works with the dev OS data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a specific blocking comment - there are some questions where I'm trying to understand what the limits of the ticketed change are, and I think this change duplicates an existing test (but not in a way that I think it should be removed).
Nothing is blocking - merge away - but I do want to ask whether we should have a test suite for the retrieve model? I see it getting covered by the controller tests, but we also have a specific model test for opensearch but not for retrieve.
|
||
if client.instance_of?(OpenSearch::Client) | ||
raise OpenSearch::Transport::Transport::Errors::NotFound if record['hits']['total']['value'].zero? | ||
else | ||
raise Elasticsearch::Transport::Transport::Errors::NotFound if record['hits']['total'].zero? | ||
elsif record['hits']['total'].zero? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't the point of the PR, but I want to note that I like this a lot better than the former syntax. Good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a rubocop fix that I included because I touched the file with this changes, but I agree!
body: build_query(from)) | ||
end | ||
|
||
def default_index | ||
ENV.fetch('ELASTICSEARCH_INDEX', nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be something we want to update when we fully convert to opensearch and remove the older elasticsearch implementation - but I got really confused for a moment wondering why this doesn't also make changes to search.rb
when it also deals with the same env. I get it now, we're using the same name in both underlying systems so it makes sense to have one env - but it tripped me up for a few minutes.
I'm guessing that this will eventually become OPENSEARCH_INDEX
or even just SEARCH_INDEX
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point, the ELASTICSEARCH_INDEX
is a legacy name that should go away.
assert_equal(expected_sources, actual_sources) | ||
end | ||
end | ||
|
||
test 'graphqlv2 can retrieve a record from a default index' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't object to this test, but wonder how it differs from the graphqlv2 retrieve
test on line 291? Other than asking for a different record, I'm not understanding the distinction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree, but I think the difference is intent. In this case, the distinction is the name of the test declares what we intend to test... the actual test logic is the same. I'm waffly on how I feel about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - I don't think we should remove this test to be clear. Reading through the file, I see this test as paired with the adjacent one that changes the index value, while the earlier test is removed enough that it isn't clear that we're testing both conditions without a higher cognitive load. While this may be a small amount of over-testing, I don't think that's particularly an issue. The suite runs pretty quickly, so there's no downside IMO.
@@ -153,7 +153,7 @@ def setup | |||
}' } | |||
assert_equal(200, response.status) | |||
json = JSON.parse(response.body) | |||
assert json['data']['search']['records'].first['contributors'].any? { |c| c.has_value? 'Moon, Intae' } | |||
assert json['data']['search']['records'].first['contributors'].any? { |c| c.value? 'Moon, Intae' } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that changes like this are about rubocop flagging .has_value
as not preferred, rather than some aspect of the index value? I don't object to the change, just wanting to make sure I'm not missing some ramfication to the ticketed change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was just rubocop fixes. I ran it on any file I touched and it fixed a few things out of scope of my changes but it seemed fine to include them. I should have made a note about that though in side effects to clarify the things that were not really part of this work that slid in.
@matt-bernhardt I think there are a few tweaks to the testing approach we want to discuss as a team soon. Adam had also noted our over reliance on "controller" tests recently. I think we'll take a good look at our practices soon and make some adjustments. |
UPDATE 2: Helen has ingested data from the 3 sources into dev opensearch with
rdi*
-prefixes and I've configured this PR to point at that opensearch. The older RDI indexes remain on that server promoted totimdex-prod
and are the counts may not match exactly as they were ingested from potentially different data sets. Example query for all 3 RDI sources:UPDATE: much simpler way to check this, use the note below to confirm searching multiple indexes with
rdi*
if you'd like, but just checking the ability to search specific indexes can be done in the PR build with the following GraphQL queries:Searching a specific index:
No index provided defaults to
timdex-prod
sources [default is defined in ENV astimdex-prod
(currently all 4 indexes are promoted in OpenSearch):You can search a specific alias (yes, this is the same as the default but it shows we can provide it explicitly)
NOTE: the indexes in the opensearch version this app is connected to have not been updated to meet the expectations of this code. To test locally, the following set of commands (modified to point to json files you have downloaded) might be useful:
At that point, I'd recommend looking at some of the new tests for syntax examples along with the playground.
Why are these changes being introduced:
of data, so we need to enable that in the API
Relevant ticket(s):
How does this address that need:
index
to both Search and Retrieve GraphQL endpoints anddocuments them as being best left out
and default to the value provided in ENV if none is provided
Document any side effects to this change:
change required a v2 feature flag for Retrieve to ensure the current
behavior continued to work
Developer
our guide and
all issues introduced by these changes have been resolved or opened as new
issues (link to those issues in the Pull Request details above)
Code Reviewer
(not just this pull request message)
Requires database migrations?
NO
Includes new or updated dependencies?
NO