Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add elasticsearch _score and highlights to ElasticsearchQueryResult #16909

Closed

Conversation

denispetrische
Copy link
Contributor

@denispetrische denispetrische commented Oct 18, 2024

I have a task to get low level elasticsearch information such as _score of a hit and highlights. And for now it is impossible to get it.

So this fix:

  1. Added support of highlights in the queries
  2. Added return of elasticsearch _score and highlights in graphQL using object return scheme

results.Add(new JsonObject(document.Select(x =>
KeyValuePair.Create(x.Key, (JsonNode)JsonValue.Create(x.Value.ToString())))));
var keyValuePairs = document.Source.Select(x =>
KeyValuePair.Create(x.Key, (JsonNode)JsonValue.Create(x.Value.ToString()))).ToList();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
KeyValuePair.Create(x.Key, (JsonNode)JsonValue.Create(x.Value.ToString()))).ToList();
KeyValuePair.Create(x.Key, (JsonNode)JsonValue.Create(x.Value.ToString())));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need ToList() here to further add two more KeyValuePairs to it with _score and highlight
image

@@ -78,10 +81,17 @@ public async Task<IQueryResults> ExecuteQueryAsync(Query query, IDictionary<stri
{
var results = new List<JsonObject>();

foreach (var document in docs.TopDocs)
foreach (var document in docs.Hits)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Hits here not TopDocs?

Copy link
Contributor

@Skrypt Skrypt Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not good. This is the old way of returning results. You should use TopDocs now. This will alter all the search results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hits contain all the data I need (including _score and highlights), topDocs is the information that contains in Hits.Source.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should be able to use return _score and highlights in the TopDocs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should be able to use return _score and highlights in the TopDocs

For now, we get topDocs from elasticSearch searchResponse.Documents.

searchResponse.Documents == searchResponse.Hit.Source so it doesn't contain _score and highlights.

image
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denispetrische I think you should modify the ElasticQueryService class, before the line "hits.Add(row);" you could add

row["_source"] = hit.Source;
row["highlight"] = hit.Highlight;

Then in the source you'll have that info available to be returned.

But, if you add highlight and score, you'll need to probably also use them when returning ContentItems. I am wondering if we can return a new object instead of returning content item directly something like

public class ElasticsearchQueryResult
{
    public string Key { get; set; }

    public double? Score { get; set; }

    public string Highlight { get; set; }

    public IEnumerable<ContentItem> ContentItems { get; set; }
}

But I am not sure about the extend of adding ElasticsearchQueryResult instead of returning ContentItems because it likely to require changes to IQueryResults

Copy link
Contributor Author

@denispetrische denispetrische Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should modify the ElasticQueryService class, before the line "hits.Add(row);" you could add

image
But this will work only if we specify Fields

image

Copy link
Contributor Author

@denispetrische denispetrische Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, if you add highlight and score, you'll need to probably also use them when returning ContentItems.

I am not sure about the extend of adding ElasticsearchQueryResult instead of returning ContentItems because it likely to require changes to IQueryResults

Yeah, It's easier to add _score and highlight only for object return type). Will it be okay if I leave contentItems as that, without score and highlight?

I think there is no problem to get all the info I need with object return type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Skrypt would be better as answering these questions as he is more involved in the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Skrypt What do you think?)

@Skrypt
Copy link
Contributor

Skrypt commented Oct 29, 2024

Ok, I'm trying to understand what you are trying to achieve or fix here in this PR. I will need to run this PR locally and it will take some time so that I can review everything and see what's happening when debugging. I don't remember everything related with ElasticSearch. These proposed changes may be appropriate or not depending on what we did with Lucene implementation and we need to make sure that both implementations are similar.

And the part involving GraphQL is probably a Query Schema that you expose for GraphQL simply right?

@denispetrische
Copy link
Contributor Author

Ok, I'm trying to understand what you are trying to achieve or fix here in this PR. I will need to run this PR locally and it will take some time so that I can review everything and see what's happening when debugging. I don't remember everything related with ElasticSearch. These proposed changes may be appropriate or not depending on what we did with Lucene implementation and we need to make sure that both implementations are similar.

And the part involving GraphQL is probably a Query Schema that you expose for GraphQL simply right?

Right)

@Skrypt Skrypt changed the title Add elasticsearch _score and highlights to graphql Add elasticsearch _score and highlights Oct 30, 2024
@Skrypt Skrypt changed the title Add elasticsearch _score and highlights Add elasticsearch _score and highlights to ElasticQueryResult Oct 30, 2024
@Skrypt Skrypt changed the title Add elasticsearch _score and highlights to ElasticQueryResult Add elasticsearch _score and highlights to ElasticsearchQueryResult Oct 30, 2024
@MikeAlhayek
Copy link
Member

@denispetrische I have a PR to upgrade Elasticsearch library. In the process, I implemented _score and Highlights. If possible, please test drive PR #17027 . Hopefully, it gives you the support you are looking for.

@denispetrische
Copy link
Contributor Author

@denispetrische I have a PR to upgrade Elasticsearch library. In the process, I implemented _score and Highlights. If possible, please test drive PR #17027 . Hopefully, it gives you the support you are looking for.

Okay, I'll check it

@denispetrische
Copy link
Contributor Author

@denispetrische I have a PR to upgrade Elasticsearch library. In the process, I implemented _score and Highlights. If possible, please test drive PR #17027 . Hopefully, it gives you the support you are looking for.

image

It doesn't work for me now, throws exception.

I used an object return type for the query.

image
image
image

@MikeAlhayek
Copy link
Member

@denispetrische if you encountered that issue after testing the other PR, please comment directly on that PR so we can fix the issue there.

Copy link
Contributor

This pull request has merge conflicts. Please resolve those before requesting a review.

@MikeAlhayek
Copy link
Member

Using the latest in main branch will provide you access to Highligh and score info by default. If you are able to, try out the latest 3.0.0-preview from cloud smith tomorrow or after 7 hours from now so you can get the latest changes from Elasticsearch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants