-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(search): Add SearchScore annotation to use fields for search ranking #4596
feat(search): Add SearchScore annotation to use fields for search ranking #4596
Conversation
entity-registry/src/main/java/com/linkedin/metadata/models/EntitySpecBuilder.java
Outdated
Show resolved
Hide resolved
// Modifier to apply to the value. None if empty. | ||
Optional<Modifier> modifier; | ||
|
||
public enum Modifier { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question - why not expect that the modifier had been applied prior to ingesting the search score?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The good thing about doing modifiers on the read path is that you can experiment with using various modifiers without having to reingest the aspects with the specific modifiers. Also counts in general is more objective by nature. Hard to reason about whether or not the feature has been modified on the write path or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohhh yes!! makes sense
return entitySpec.getSearchableFieldSpecs() | ||
private static FunctionScoreQueryBuilder.FilterFunctionBuilder[] buildScoreFunctions(@Nonnull EntitySpec entitySpec) { | ||
List<FunctionScoreQueryBuilder.FilterFunctionBuilder> finalScoreFunctions = new ArrayList<>(); | ||
// Add a default weight of 1.0 to make sure the score function is larger than 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why tho?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in case all features are missing, in which case the functino score will be 0. Since we multiply functino score with the query score, if the function score is 0, it will make all the scores the same regardless of how well the query matches the field values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay nice!
private static FunctionScoreQueryBuilder.FilterFunctionBuilder buildScoreFunctionFromSearchScoreAnnotation( | ||
@Nonnull SearchScoreAnnotation annotation) { | ||
FieldValueFactorFunctionBuilder scoreFunction = | ||
ScoreFunctionBuilders.fieldValueFactorFunction(annotation.getFieldName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is a factor function...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...c/main/java/com/linkedin/metadata/search/elasticsearch/query/request/SearchQueryBuilder.java
Outdated
Show resolved
Hide resolved
@@ -32,7 +33,12 @@ | |||
* Rank the input list of entities | |||
*/ | |||
public List<SearchEntity> rank(List<SearchEntity> originalList) { | |||
return Streams.zip(originalList.stream(), fetchFeatures(originalList).stream(), this::updateFeatures) | |||
List<SearchEntity> entitiesToRank = originalList; | |||
if (!getFeatureExtractors().isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we still cn invoke feature extractors? I believe these are useful in search snippets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. this is for oss where I am removing all feature extractors since they are never used. There is no need to update the entities when there are no feature extractors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
private SearchEntity updateFeatures(SearchEntity originalEntity, Features features) { | ||
return new SearchEntity().setEntity(originalEntity.getEntity()) | ||
.setMatchedFields(originalEntity.getMatchedFields()) | ||
return originalEntity.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fancy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got mad having this build fail every time I have to add a new field :(
@@ -28,7 +28,7 @@ public SimpleRanker() { | |||
@Override | |||
public Pair<Double, Double> score(SearchEntity searchEntity) { | |||
Features features = Features.from(searchEntity.getFeatures()); | |||
return Pair.of(-features.getNumericFeature(Features.Name.RANK_WITHIN_TYPE, 0.0), | |||
features.getNumericFeature(Features.Name.NUM_ENTITIES_PER_TYPE, 0.0)); | |||
return Pair.of(Optional.ofNullable(searchEntity.getScore()).orElse(0.0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind explaining what impact this has? We are now using the search score (if it exists) to rank instead of the number of entities per type?
Remind me why we have a Pair<Double, Double> here? I don't quite get the data structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pair is simple. it means that it will rank in order. The first element is used to order first. If the first element is the exact same, it will use the second element.
I mean here, we may as well just use the score bc the probability of that being exactly the same is very low.
So overall impact is that we will stop mixing entity types in an artificial way. Before we would do dataset - chart - dashboard - dataset - chart -dashboard. Now, it will be based on the match score, which from trying out various queries seemed reasonable. I don't think there is a compelling reason to artificially mix in different entity types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to me - thanks
final Boolean forDelete | ||
) { | ||
final Map<SearchableFieldSpec, List<Object>> extractedFields = | ||
public Optional<String> transformSnapshot(final RecordTemplate snapshot, final EntitySpec entitySpec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question - does this function really accept snapshots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah legacy. It's why we have two functions here. one for snapshot based entities and one for not. Not sure where it is still used tho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it - ty
@@ -119,6 +119,40 @@ public void setValue(final SearchableFieldSpec fieldSpec, final List<Object> fie | |||
} | |||
} | |||
|
|||
public void setSearchScoreValue(final SearchScoreFieldSpec fieldSpec, final List<Object> fieldValues, | |||
final ObjectNode searchDocument, final Boolean forDelete) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "forDelete" mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something gabe added before. It is to support deleting aspects, so it sends over empty jsons for the fields that are being deleted so it gets deleted on elasticsearch side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah... okay
...ata-io/src/main/java/com/linkedin/metadata/search/transformer/SearchDocumentTransformer.java
Show resolved
Hide resolved
@@ -27,4 +27,6 @@ record SearchEntity { | |||
}] = [] | |||
|
|||
features: optional map[string, double] | |||
|
|||
score: optional double |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we have this new score, it doesn't mean we won't have features right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This score is the score from elasticsearch. we still have features in this pr for future usage when doing a second pass ranking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay nice!!
…king (datahub-project#4596) * Add SearchScore annotation * Add back test-model * Remove search features * Fix to John's comments * simplify ranker * Fix checkstyle
Add a search score annotation which supports the following fields
By adding these annotations, it adds the numeric value into the index, and on query time constructs a function_score query combining the score from the query matching (how close is the field match with the query) with the score from these annotated values.
Currently, we won't have any direct support for SearchScores, but we will add them soon.
Checklist