You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This RfC discusses the question, how Querqy can be improved regarding traditional search approaches (keyword-based) and whether Querqy can help to release vector search.
1. Current State of Retail Search
1.1 Limitations of implementations using traditional search (keyword-based)
Traditional search implements many features as isolated and poorly integrated, or not at all (e.g. spelling, (de-)compound, relaxation, expansion). Most search applications implement them somehow, but only address a certain amount of queries. In sum, a lot of long-tail queries remain unhandled.
One reason for these insufficiencies is that the features are not implemented in the context of proper rewriting. Many solutions take absolute decisions, e.g. replacing terms for spellings or term deletions for relaxation, making the impact of false positives quite severe. Therefore, the features are only used e.g. in the context of zero results.
Many solutions (e.g. spell correction) are based on comparisons of query terms with product data. However, the comparison of query terms to previous query data is much more reliable. This requires the handling of query lists in rewriting
1.2 Obstacles of releasing vector search
Vector search is able to compensate many of the deficits of implementations using traditional ways, but fails so far to fully replace traditional search due to a lack of important features. Especially for retail search, it is common sense that hybrid search is the best way to go as of now.
Many obstacles must be overcome on a logical and technical level for implementing, integrating and releasing vector search. The core reason for this is that vector search is a fully new approach and difficult to integrate in a qualitatively and valuable way. There are only few options to quickly integrate a vector space as a part of a single feature.
1.3 Conclusions
Traditional search is still required and still leaves much room for improvement.
Integrating vector search into existing search solutions is difficult.
2. Current State of Querqy (Unplugged)
2.1 Limitations of Querqy
Querqy is still the best known way to integrate many recall- (and partially also relevance-) related search features, but it is very difficult to customize it for individual demands.
Several features are not supported out-of-the-box to achieve a good search quality quickly (e.g. spelling, relaxation, expansion; (de-)compound for Querqy Unplugged).
Even though Lucene solves many of the components required for the mentioned rewriting features (e.g. lookup, fuzzy matching, input preprocessing), Querqy is based on its own implementations making progress cumbersome and slow.
Java supports several ways to make use vector spaces, e.g. ONNX or DeepJavaLibrary, these ways should be used for rewriting purposes (e.g. relaxing the terms with the longes distance to the others, expanding subcategories, etc.).
2.2 Conclusions
Querqy needs layers to facilitate implementing rewriters.
For the effitcient implementation of rewriters using traditional approaches, Querqy needs to make use of Lucene for rewriting logic.
Querqy should be enhanced in a way to facilitate using vector spaces for rewriting purposes.
3. Integration of features
A solution for fully integrated search features could look as follows:
We have a given vocabulary including ["holz", "holzkohlegrill", "grill"]
Query input: "hozl kohlegrill gh567"
The rewriter for wordbreak / spell correction first identifies the corresponding words in the vocabulary and expands the query accordingly. Subsequently, a suffix expansion rewriter identifies that the word "grill" is the super-category for "holzkohlegrill" and expands the query accordingly. Finally, a relaxation rewriter identifies that "gh567" is quite unrelated to the rest of the query and makes it optional (should-clause instead of must-clause).
All these steps can be solved using the Lucene library and/or using a vector space.
The text was updated successfully, but these errors were encountered:
This RfC discusses the question, how Querqy can be improved regarding traditional search approaches (keyword-based) and whether Querqy can help to release vector search.
1. Current State of Retail Search
1.1 Limitations of implementations using traditional search (keyword-based)
1.2 Obstacles of releasing vector search
1.3 Conclusions
2. Current State of Querqy (Unplugged)
2.1 Limitations of Querqy
2.2 Conclusions
3. Integration of features
A solution for fully integrated search features could look as follows:
We have a given vocabulary including ["holz", "holzkohlegrill", "grill"]
Query input: "hozl kohlegrill gh567"
The rewriter for wordbreak / spell correction first identifies the corresponding words in the vocabulary and expands the query accordingly. Subsequently, a suffix expansion rewriter identifies that the word "grill" is the super-category for "holzkohlegrill" and expands the query accordingly. Finally, a relaxation rewriter identifies that "gh567" is quite unrelated to the rest of the query and makes it optional (should-clause instead of must-clause).
All these steps can be solved using the Lucene library and/or using a vector space.
The text was updated successfully, but these errors were encountered: