Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs #13910

Merged
merged 5 commits into from
Oct 17, 2024

Conversation

msokolov
Copy link
Contributor

While exploring some recall-related failures in another PR I went looking for a unit test that checks HNSW/KNN recall and couldn't find any. I think we used to have one but maybe we removed it because it was flaky? But we really do need such a test since it is possible to make changes that preserve all the formal properties of the codecs and queries yet destroy recall. I thought if we can create such a test with known data and vectors it would be more predictable than one using random data, so I made one, and it uncovered a couple of bugs:

In Lucene90HnswVectorsReader we messed up (removed) ord-to-doc mappings so we were returning vector ords instead of docids in search results. I guess this would have totally borked back-compat for Lucene90 indexes. Probably there are none in the wild, and this was never noticed?

In Lucene91RWFormat (used only for back-compat testing) we messed up diversity check so we were producing bad graphs.

This PR fixes these things and adds the new test

@msokolov
Copy link
Contributor Author

msokolov commented Oct 14, 2024

I wonder whether we should backport the fixes to the Lucene90HnswVectorsReader? I tend to think we ought to, although the usage might be tiny to nonexistent, we could get a fix out in 10.1

@benwtrent benwtrent self-requested a review October 16, 2024 13:03
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a test like this. I wonder if its better to just grab 100-500 sift vectors and put them in a file though.

@@ -260,7 +260,7 @@ public void search(String field, float[] target, KnnCollector knnCollector, Bits
int node = results.topNode();
float minSimilarity = results.topScore();
results.pop();
knnCollector.collect(node, minSimilarity);
knnCollector.collect(vectorValues.ordToDoc(node), minSimilarity);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Lucene90 didn't support sparse vector values, I am not sure this is strictly necessary. But I can understand it from a consistency standpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a relief! I couldn't remember if we had that or not. At any rate it is possible to create a sparse 90 index in tests now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a relief! I couldn't remember if we had that or not. At any rate it is possible to create a sparse 90 index in tests now.

This is odd? When a Codec format is replaced, we move the "read only" part to backwards-codecs module, and the full read/write original codec to test-framework so that unit tests can produce a 9.0 index and run modern tests (even new unit tests added since 9.0) against such "old" indices.

But that read/write codec format should not be able to produce an index that the original 9.0 (core) format was not able to produce. They should be the same original code ... so maybe sparse vector values were in fact writable in 9.0 (by accident?)?

Comment on lines +106 to +108
public void testRecall() {
// ignore this test since this class always returns no results from search
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that the underlying flat format should allow search to be called and it simply iterates all matching vectors. But we can adjust that in a later PR.

@@ -562,7 +562,7 @@ private void add(
String idString = Integer.toString(id);
doc.add(new StringField("id", idString, Field.Store.YES));
doc.add(new SortedDocValuesField("id", new BytesRef(idString)));
// XSSystem.out.println("add " + idString + " " + Arrays.toString(vector));
// System.out.println("add " + idString + " " + Arrays.toString(vector));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just delete the line?

++recalled;
}
}
assertTrue("recall should be at least 5/10, got " + recalled, recalled >= 5);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would hope recall should be within some known parameter. It would be good to know if recall improved or worsened. Either case could show an unexpected change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be tricky to be very precise given the range of codec options. I guess we could specialize per codec?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having an assertAvgRecall in the base class that can be overridden would be really nice.

Comment on lines +1937 to +1938
// indexed 421 lines from LICENSE.txt
// indexed 157 lines from NOTICE.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are adding two files like this, I wonder if we should simply take real vectors from a real embedding model and put them in the resources folder.

Maybe we can use glove or sift? Those are pretty small vectors, though only euclidean is expected (we would have to normalize for dot-product).

My concern is that having such a simplistic vector with so few dimensions might not actually be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lemme see if I can find bugs with the "dictionary" PR? We can always beef this up with more realistic data

* gross failures only, not to represent the true expected recall.
*/
public void testRecall() throws IOException {
VectorSimilarityFunction vectorSimilarityFunction = VectorSimilarityFunction.EUCLIDEAN;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be really neat if this went through each one. Quantization will do special things for different similarity functions and exercising each of those paths would be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, we want to test all the things

Copy link
Contributor Author

@msokolov msokolov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for review!

@@ -260,7 +260,7 @@ public void search(String field, float[] target, KnnCollector knnCollector, Bits
int node = results.topNode();
float minSimilarity = results.topScore();
results.pop();
knnCollector.collect(node, minSimilarity);
knnCollector.collect(vectorValues.ordToDoc(node), minSimilarity);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a relief! I couldn't remember if we had that or not. At any rate it is possible to create a sparse 90 index in tests now.

* gross failures only, not to represent the true expected recall.
*/
public void testRecall() throws IOException {
VectorSimilarityFunction vectorSimilarityFunction = VectorSimilarityFunction.EUCLIDEAN;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, we want to test all the things

Comment on lines +1937 to +1938
// indexed 421 lines from LICENSE.txt
// indexed 157 lines from NOTICE.txt
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lemme see if I can find bugs with the "dictionary" PR? We can always beef this up with more realistic data

++recalled;
}
}
assertTrue("recall should be at least 5/10, got " + recalled, recalled >= 5);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be tricky to be very precise given the range of codec options. I guess we could specialize per codec?

Comment on lines 1935 to 1936
KnnFloatVectorQuery exactQuery =
new KnnFloatVectorQuery("field", queryEmbedding, 1000, new MatchAllDocsQuery());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think for more consistent runs, we may want to have multiple query embeddings that we test with and gather min max and avg recalls. But this can be a further refinement on this work.

I just think having a single query might be very flaky in the long run.

@msokolov
Copy link
Contributor Author

@benwtrent I think I addressed all your comments except adding binary vectors. I think as long as the vectors are not too degenerate and always the same the test purpose is satisfied, but I'm not averse to replacing these janky ones with more "scientific" ones either, I just don't have the handy.

This test seems to pass on multiple runs and it is exposing problems with the dictionary refactor I'm working on, so I think it's helpful

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This is a nice sanity check for abhorrent behavior :)

@msokolov msokolov merged commit 3983fa2 into apache:main Oct 17, 2024
3 checks passed
@msokolov msokolov deleted the test_knn_recall branch October 17, 2024 15:50
@msokolov
Copy link
Contributor Author

msokolov commented Oct 21, 2024

Since Lucene90 didn't support sparse vector values, I am not sure this is strictly necessary. But I can understand it from a consistency standpoint.

After reflection, I don't think this is true. We always supported sparse vector indexes - I don't see how we could have avoided it, really. It seems to me this bug was introduced here and it was released as part of 9.8, meaning that releases 9.8 and later in the 9.x series will be able to read indexes produced in 9.0 but will give meaningless results for HNSW searches over those indexes. This seems like something we maybe ought to make the user community aware of.

I'm just going to leave this here for comment for a bit, but soon I'll draft an actual email describing the situation and any mitigations (upgrade your software to 10? Rewrite your index with a 9.1-9.7 release?)

@msokolov
Copy link
Contributor Author

ok something like this:

Dear Lucene user community,

We recently uncovered a backwards compatibility bug that affects indexes created with version 9.0 containing KNN vector fields. Versions 9.8 - 9.12 are unable to search vectors in such indexes correctly and will return incorrect results without raising any error. We think it's likely very few if any of you are using 9.0 indexes, but if you are, possible mitigation steps are:

  • Upgrade to 10.0 or later, or
  • Do not upgrade past 9.7, or
  • If you must use an affected Lucene version (9.8-9.12) and you have 9.0-written indexes including KNN vector fields, you must recreate those indexes from source with your current Lucene version.

@benwtrent
Copy link
Member

@msokolov could we do a simpler patch for 9.12.1?

@msokolov
Copy link
Contributor Author

Yes, maybe we should -- I think it would be a one-liner

@msokolov
Copy link
Contributor Author

There is another upgrade path -- if you started with 9.0 and then "upgraded" your index by rewriting it (eg with IndexUpdater tool) via merge to 9.1-9.7 you could subsequently read the index with later versions. But this seemed kind of complex to explain for a case that probably doesn't exist.

@mikemccand
Copy link
Member

This seems like something we maybe ought to make the user community aware of.

+1 thanks @msokolov.

@msokolov could we do a simpler patch for 9.12.1?

+1. 9.12.1 would also contain the fix for the other (horrible, caused by me!) bwc break specifically for LPVQ quantized vector indices.

Backwards compatibility is hard!!

@benwtrent
Copy link
Member

@mikemccand I have a PR open for this bug fix for 9.12. Will merge soon.

Could you add a CHANGES entry in 9.12 for your bug fix for 9.12.1?

@mikemccand
Copy link
Member

Could you add a CHANGES entry in 9.12 for your bug fix for 9.12.1?

Ahh yes sorry I will do that today!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants