Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for querying specific indexes #540

Merged
merged 1 commit into from
Jul 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 18 additions & 7 deletions app/graphql/types/query_type.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,18 @@ def ping
'Pong!'
end

field :record_id, RecordType, null: false,
description: 'Retrieve one timdex record' do
argument :id, String, required: true
if Flipflop.v2?
field :record_id, RecordType, null: false,
description: 'Retrieve one timdex record' do
argument :id, String, required: true
argument :index, String, required: false, default_value: nil,
description: 'It is not recommended to provide an index value unless we have provided you with one for your specific use case'
end
else
field :record_id, RecordType, null: false,
description: 'Retrieve one timdex record' do
argument :id, String, required: true
end
end

if Flipflop.v2?
Expand All @@ -26,8 +35,8 @@ def info
end

if Flipflop.v2?
def record_id(id:)
result = Retrieve.new.fetch(id, Timdex::OSClient)
def record_id(id:, index:)
result = Retrieve.new.fetch(id, Timdex::OSClient, index)
result['hits']['hits'].first['_source']
rescue Elasticsearch::Transport::Transport::Errors::NotFound
raise GraphQL::ExecutionError, "Record '#{id}' not found"
Expand All @@ -44,6 +53,8 @@ def record_id(id:)
argument :subjects, String, required: false, default_value: nil
argument :title, String, required: false, default_value: nil
argument :from, String, required: false, default_value: '0'
argument :index, String, required: false, default_value: nil,
description: 'It is not recommended to provide an index value unless we have provided you with one for your specific use case'

# applied facets
argument :collection_facet, [String], required: false, default_value: nil
Expand Down Expand Up @@ -81,11 +92,11 @@ def record_id(id:)

if Flipflop.v2?
def search(searchterm:, citation:, contributors:, funding_information:, identifiers:, locations:, subjects:,
title:, from:, **facets)
title:, index:, from:, **facets)
query = construct_query(searchterm, citation, contributors, funding_information, identifiers, locations,
subjects, title, facets)

results = Opensearch.new.search(from, query, Timdex::OSClient)
results = Opensearch.new.search(from, query, Timdex::OSClient, index)

response = {}
response[:hits] = results['hits']['total']['value']
Expand Down
9 changes: 7 additions & 2 deletions app/models/opensearch.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,17 @@ class Opensearch
SIZE = 20
MAX_PAGE = 200

def search(from, params, client)
def search(from, params, client, index = nil)
@params = params
client.search(index: ENV.fetch('ELASTICSEARCH_INDEX', nil),
index = default_index unless index.present?
client.search(index: index,
body: build_query(from))
end

def default_index
ENV.fetch('ELASTICSEARCH_INDEX', nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be something we want to update when we fully convert to opensearch and remove the older elasticsearch implementation - but I got really confused for a moment wondering why this doesn't also make changes to search.rb when it also deals with the same env. I get it now, we're using the same name in both underlying systems so it makes sense to have one env - but it tripped me up for a few minutes.

I'm guessing that this will eventually become OPENSEARCH_INDEX or even just SEARCH_INDEX or something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point, the ELASTICSEARCH_INDEX is a legacy name that should go away.

end

# Construct the json query to send to elasticsearch
def build_query(from)
{
Expand Down
15 changes: 11 additions & 4 deletions app/models/retrieve.rb
Original file line number Diff line number Diff line change
@@ -1,17 +1,24 @@
class Retrieve
def fetch(id, client)
def fetch(id, client, index = nil)
f = to_filter(id)
record = client.search(index: ENV['ELASTICSEARCH_INDEX'], body: f)

index = default_index unless index.present?

record = client.search(index: index, body: f)

if client.instance_of?(OpenSearch::Client)
raise OpenSearch::Transport::Transport::Errors::NotFound if record['hits']['total']['value'].zero?
else
raise Elasticsearch::Transport::Transport::Errors::NotFound if record['hits']['total'].zero?
elsif record['hits']['total'].zero?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the point of the PR, but I want to note that I like this a lot better than the former syntax. Good catch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a rubocop fix that I included because I touched the file with this changes, but I agree!

raise Elasticsearch::Transport::Transport::Errors::NotFound
end

record
end

def default_index
ENV.fetch('ELASTICSEARCH_INDEX', nil)
end

def to_filter(id)
{
query: {
Expand Down
56 changes: 41 additions & 15 deletions test/controllers/graphql_controller_v2_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ def setup
}' }
assert_equal(200, response.status)
json = JSON.parse(response.body)
assert json['data']['search']['records'].first['contributors'].any? { |c| c.has_value? 'Moon, Intae' }
assert json['data']['search']['records'].first['contributors'].any? { |c| c.value? 'Moon, Intae' }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that changes like this are about rubocop flagging .has_value as not preferred, rather than some aspect of the index value? I don't object to the change, just wanting to make sure I'm not missing some ramfication to the ticketed change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was just rubocop fixes. I ran it on any file I touched and it fixed a few things out of scope of my changes but it seemed fine to include them. I should have made a note about that though in side effects to clarify the things that were not really part of this work that slid in.

end
end

Expand All @@ -177,9 +177,9 @@ def setup
assert_equal 'A common table : 80 recipes and stories from my shared cultures /',
json['data']['search']['records'].first['title']
assert json['data']['search']['records'].first['contributors'].any? { |c|
c.has_value? 'McTernan, Cynthia Chen, author.'
c.value? 'McTernan, Cynthia Chen, author.'
}
assert json['data']['search']['records'].first['identifiers'].any? { |i| i.has_value? '163565002X (hardback)' }
assert json['data']['search']['records'].first['identifiers'].any? { |i| i.value? '163565002X (hardback)' }
end
end

Expand Down Expand Up @@ -317,7 +317,6 @@ def setup

test 'graphqlv2 filter multiple sources' do
VCR.use_cassette('graphql v2 filter multiple sources') do

# no filters to return all sources. used later to test filters return less than the total.
post '/graphql', params: { query:
'{
Expand All @@ -330,8 +329,7 @@ def setup
}
}
}
}'
}
}' }

json = JSON.parse(response.body)
initial_source_array = json['data']['search']['aggregations']['source']
Expand All @@ -348,8 +346,7 @@ def setup
}
}
}
}'
}
}' }
assert_equal(200, response.status)

json = JSON.parse(response.body)
Expand All @@ -359,14 +356,13 @@ def setup
assert_equal(2, filtered_source_array.count)

expected_sources = ['zenodo', 'dspace@mit']
actual_sources = filtered_source_array.map{|source| source["key"]}
actual_sources = filtered_source_array.map { |source| source['key'] }
assert_equal(expected_sources, actual_sources)
end
end

test 'graphqlv2 filter single source' do
VCR.use_cassette('graphql v2 filter single source') do

# no filters to return all sources. used later to test filters return less than the total.
post '/graphql', params: { query:
'{
Expand All @@ -379,8 +375,7 @@ def setup
}
}
}
}'
}
}' }

json = JSON.parse(response.body)
initial_source_array = json['data']['search']['aggregations']['source']
Expand All @@ -397,8 +392,7 @@ def setup
}
}
}
}'
}
}' }
assert_equal(200, response.status)

json = JSON.parse(response.body)
Expand All @@ -408,8 +402,40 @@ def setup
assert_equal(1, filtered_source_array.count)

expected_sources = ['dspace@mit']
actual_sources = filtered_source_array.map{|source| source["key"]}
actual_sources = filtered_source_array.map { |source| source['key'] }
assert_equal(expected_sources, actual_sources)
end
end

test 'graphqlv2 can retrieve a record from a default index' do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't object to this test, but wonder how it differs from the graphqlv2 retrieve test on line 291? Other than asking for a different record, I'm not understanding the distinction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree, but I think the difference is intent. In this case, the distinction is the name of the test declares what we intend to test... the actual test logic is the same. I'm waffly on how I feel about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - I don't think we should remove this test to be clear. Reading through the file, I see this test as paired with the adjacent one that changes the index value, while the earlier test is removed enough that it isn't clear that we're testing both conditions without a higher cognitive load. While this may be a small amount of over-testing, I don't think that's particularly an issue. The suite runs pretty quickly, so there's no downside IMO.

# fragile test: specific item expected in default index
VCR.use_cassette('graphql v2 retrieve from default index') do
post '/graphql', params: { query:
'{
recordId(id: "dspace:1721.1-44968") {
timdexRecordId
title
}
}' }

json = JSON.parse(response.body)
assert_equal('dspace:1721.1-44968', json['data']['recordId']['timdexRecordId'])
end
end

test 'graphqlv2 can retrive a record from a specified index' do
# fragile test: specific item expected in specified index
VCR.use_cassette('graphql v2 retrieve from rdi* index') do
post '/graphql', params: { query:
'{
recordId(id: "zenodo:5728409", index: "rdi*") {
timdexRecordId
title
}
}' }

json = JSON.parse(response.body)
assert_equal('zenodo:5728409', json['data']['recordId']['timdexRecordId'])
end
end
end
20 changes: 20 additions & 0 deletions test/models/opensearch_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,26 @@ class OpensearchTest < ActiveSupport::TestCase
assert matches.select { |m| m['subjects.value'] == 'assured' }
end

test 'can override index' do
# fragile test: assumes opensearch instance with at least one index prefixed with `rdi`
VCR.use_cassette('opensearch non-default index') do
params = { title: 'data' }
results = Opensearch.new.search(0, params, Timdex::OSClient, 'rdi*')
assert results['hits']['hits'].map { |hit| hit['_index'] }.uniq.map { |index| index.start_with?('rdi') }.any?
end
end

test 'default index' do
# fragile test: assumes opensearch instance with at least one index promoted to timdex-prod and no promoted indexes
# that start with rdi*
VCR.use_cassette('opensearch default index') do
params = { title: 'data' }
results = Opensearch.new.search(0, params, Timdex::OSClient)
refute results['hits']['hits'].map { |hit| hit['_index'] }.uniq.map { |index| index.start_with?('rdi') }.any?
assert results['hits']['hits'].map { |hit| hit['_index'] }.uniq.any?
end
end

test 'searches a single field' do
VCR.use_cassette('opensearch single field') do
params = { title: 'spice' }
Expand Down
96 changes: 96 additions & 0 deletions test/vcr_cassettes/graphql_v2_retrieve_from_default_index.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading