Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BlockMax WAND #6176

Merged
merged 114 commits into from
Dec 3, 2024
Merged

BlockMax WAND #6176

merged 114 commits into from
Dec 3, 2024

Conversation

amourao
Copy link
Contributor

@amourao amourao commented Nov 4, 2024

What's being changed:

  • New inverted format that:
    • Writes docs ids and term frequencies in blocks
    • Compresses doc ids and frequencies
    • Writes tombstones to a separate area of the file
  • Use BlockMax WAND to make search faster

Review checklist

  • Documentation has been updated, if necessary. Link to changed documentation:
  • Chaos pipeline run or not necessary. Link to pipeline:
  • All new code is covered by tests where it is reasonable.
  • Performance tests have been run or not necessary.

Cross-functional impact

  • This change requires public documentation (weaviate-io) to be updated. Check the box to automatically create a corresponding issue.
    • Score values will change slightly when searching on multiple properties, due to using individual prop statistics instead of global statistics.
  • Does it require a change in the client libraries? If yes, please check the boxes for the affected client libraries.
    • Python (weaviate-python-client)
    • JavaScript/TypeScript (typescript-client)
    • Go (weaviate-go-client)
    • Java (java-client)

Refactored code and added tests to improve the bitpacked entry and data encoding
…n they are small

When the posting lists are < 4-6 docs, it may be worth it to write them non-compressed due to the compression overhead
…mbination

Make BlockMax search work for multiple segments and properties by combining the results
@amourao amourao requested a review from jeroiraz November 27, 2024 14:02
…o loadAllTombstones method for improved clarity
…ilter for improved clarity and consistency in handling tombstones and filters
… varint encoding and decoding for better readability"

This reverts commit c6c0041.
@amourao
Copy link
Contributor Author

amourao commented Dec 3, 2024

Quality Gate Failed Quality Gate failed

Failed conditions 14.3% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

The duplication here mostly comes from tests:

  • compactor_inverted_integration_test.go shares code with compactor_map_integration_test.go, but making it reusable would make it less readable;
  • bm25f_block_test.go shares code with bm25f_test.go. Here it is even harder, as the index setup is the same, but the scores change slightly (by design) across tests.

@amourao amourao requested review from parkerduckworth and removed request for parkerduckworth December 3, 2024 17:29
Copy link

sonarcloud bot commented Dec 3, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
14.4% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@amourao amourao merged commit 9e88d74 into main Dec 3, 2024
45 of 47 checks passed
@amourao amourao deleted the poc-blockmax-wand branch December 3, 2024 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants