feat: Qdrant - Add group_by and group_size optional parameters to Retrievers #1054
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related Issues
Proposed Changes:
Adding the
group_by
andgroup_size
parameters to Qdrant Retrievers.See: https://qdrant.tech/documentation/concepts/hybrid-queries/#grouping and https://api.qdrant.tech/master/api-reference/search/query-points-groups
This allow to specify a maximum number of chunks to retrieve for a specific metadata value. Useful to force the Retriever to retriever for many documents instead of focusing on one.
How did you test it?
Added new test for each retriever as well as fixing previous test like converting to dict.
Notes for the reviewer
The test are flaky sometimes in the number of documents retriever.
Technically it can never be more documents than
top_k * group_size
, but sometimes it's less. (I guess it can be less thantop_k
)In the context of
group_by
search, thetop_k
represent the maximum number of groups, doc have been updated to reflect this.This code is not optimal with the big
IF/ELIF
, but I didn't wanted to implement yet another Retriever to this integration.Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.