Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs #13910

msokolov · 2024-10-14T17:33:15Z

While exploring some recall-related failures in another PR I went looking for a unit test that checks HNSW/KNN recall and couldn't find any. I think we used to have one but maybe we removed it because it was flaky? But we really do need such a test since it is possible to make changes that preserve all the formal properties of the codecs and queries yet destroy recall. I thought if we can create such a test with known data and vectors it would be more predictable than one using random data, so I made one, and it uncovered a couple of bugs:

In Lucene90HnswVectorsReader we messed up (removed) ord-to-doc mappings so we were returning vector ords instead of docids in search results. I guess this would have totally borked back-compat for Lucene90 indexes. Probably there are none in the wild, and this was never noticed?

In Lucene91RWFormat (used only for back-compat testing) we messed up diversity check so we were producing bad graphs.

This PR fixes these things and adds the new test

…in Lucene90HnswVectorsReader

msokolov · 2024-10-14T22:12:05Z

I wonder whether we should backport the fixes to the Lucene90HnswVectorsReader? I tend to think we ought to, although the usage might be tiny to nonexistent, we could get a fix out in 10.1

benwtrent

I think we should have a test like this. I wonder if its better to just grab 100-500 sift vectors and put them in a file though.

benwtrent · 2024-10-16T13:04:13Z

...rd-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/Lucene90HnswVectorsReader.java

@@ -260,7 +260,7 @@ public void search(String field, float[] target, KnnCollector knnCollector, Bits
      int node = results.topNode();
      float minSimilarity = results.topScore();
      results.pop();
-      knnCollector.collect(node, minSimilarity);
+      knnCollector.collect(vectorValues.ordToDoc(node), minSimilarity);


Since Lucene90 didn't support sparse vector values, I am not sure this is strictly necessary. But I can understand it from a consistency standpoint.

Oh, that's a relief! I couldn't remember if we had that or not. At any rate it is possible to create a sparse 90 index in tests now.

Oh, that's a relief! I couldn't remember if we had that or not. At any rate it is possible to create a sparse 90 index in tests now.

This is odd? When a Codec format is replaced, we move the "read only" part to backwards-codecs module, and the full read/write original codec to test-framework so that unit tests can produce a 9.0 index and run modern tests (even new unit tests added since 9.0) against such "old" indices.

But that read/write codec format should not be able to produce an index that the original 9.0 (core) format was not able to produce. They should be the same original code ... so maybe sparse vector values were in fact writable in 9.0 (by accident?)?

benwtrent · 2024-10-16T13:09:17Z

...ore/src/test/org/apache/lucene/codecs/lucene99/TestLucene99ScalarQuantizedVectorsFormat.java

+  public void testRecall() {
+    // ignore this test since this class always returns no results from search
+  }


I still think that the underlying flat format should allow search to be called and it simply iterates all matching vectors. But we can adjust that in a later PR.

benwtrent · 2024-10-16T13:09:33Z

lucene/core/src/test/org/apache/lucene/index/TestKnnGraph.java

@@ -562,7 +562,7 @@ private void add(
    String idString = Integer.toString(id);
    doc.add(new StringField("id", idString, Field.Store.YES));
    doc.add(new SortedDocValuesField("id", new BytesRef(idString)));
-    // XSSystem.out.println("add " + idString + " " + Arrays.toString(vector));
+    // System.out.println("add " + idString + " " + Arrays.toString(vector));


Just delete the line?

benwtrent · 2024-10-16T13:13:49Z

lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java

+          ++recalled;
+        }
+      }
+      assertTrue("recall should be at least 5/10, got " + recalled, recalled >= 5);


I would hope recall should be within some known parameter. It would be good to know if recall improved or worsened. Either case could show an unexpected change.

I think it could be tricky to be very precise given the range of codec options. I guess we could specialize per codec?

I think having an assertAvgRecall in the base class that can be overridden would be really nice.

benwtrent · 2024-10-16T13:20:18Z

lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java

+      // indexed 421 lines from LICENSE.txt
+      // indexed 157 lines from NOTICE.txt


If we are adding two files like this, I wonder if we should simply take real vectors from a real embedding model and put them in the resources folder.

Maybe we can use glove or sift? Those are pretty small vectors, though only euclidean is expected (we would have to normalize for dot-product).

My concern is that having such a simplistic vector with so few dimensions might not actually be useful.

lemme see if I can find bugs with the "dictionary" PR? We can always beef this up with more realistic data

benwtrent · 2024-10-16T13:20:59Z

lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java

+   * gross failures only, not to represent the true expected recall.
+   */
+  public void testRecall() throws IOException {
+    VectorSimilarityFunction vectorSimilarityFunction = VectorSimilarityFunction.EUCLIDEAN;


It would be really neat if this went through each one. Quantization will do special things for different similarity functions and exercising each of those paths would be good.

agreed, we want to test all the things

msokolov

thanks for review!

msokolov · 2024-10-16T13:31:08Z

...rd-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/Lucene90HnswVectorsReader.java

@@ -260,7 +260,7 @@ public void search(String field, float[] target, KnnCollector knnCollector, Bits
      int node = results.topNode();
      float minSimilarity = results.topScore();
      results.pop();
-      knnCollector.collect(node, minSimilarity);
+      knnCollector.collect(vectorValues.ordToDoc(node), minSimilarity);


Oh, that's a relief! I couldn't remember if we had that or not. At any rate it is possible to create a sparse 90 index in tests now.

msokolov · 2024-10-16T13:32:15Z

lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java

+   * gross failures only, not to represent the true expected recall.
+   */
+  public void testRecall() throws IOException {
+    VectorSimilarityFunction vectorSimilarityFunction = VectorSimilarityFunction.EUCLIDEAN;


agreed, we want to test all the things

msokolov · 2024-10-16T13:33:02Z

lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java

+      // indexed 421 lines from LICENSE.txt
+      // indexed 157 lines from NOTICE.txt


lemme see if I can find bugs with the "dictionary" PR? We can always beef this up with more realistic data

msokolov · 2024-10-16T13:33:45Z

lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java

+          ++recalled;
+        }
+      }
+      assertTrue("recall should be at least 5/10, got " + recalled, recalled >= 5);


I think it could be tricky to be very precise given the range of codec options. I guess we could specialize per codec?

benwtrent · 2024-10-16T13:42:13Z

lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java

+      KnnFloatVectorQuery exactQuery =
+          new KnnFloatVectorQuery("field", queryEmbedding, 1000, new MatchAllDocsQuery());


Also, I think for more consistent runs, we may want to have multiple query embeddings that we test with and gather min max and avg recalls. But this can be a further refinement on this work.

I just think having a single query might be very flaky in the long run.

…in Lucene90HnswVectorsReader

msokolov · 2024-10-17T15:32:40Z

@benwtrent I think I addressed all your comments except adding binary vectors. I think as long as the vectors are not too degenerate and always the same the test purpose is satisfied, but I'm not averse to replacing these janky ones with more "scientific" ones either, I just don't have the handy.

This test seems to pass on multiple runs and it is exposing problems with the dictionary refactor I'm working on, so I think it's helpful

benwtrent

LGTM! This is a nice sanity check for abhorrent behavior :)

msokolov · 2024-10-21T17:35:13Z

Since Lucene90 didn't support sparse vector values, I am not sure this is strictly necessary. But I can understand it from a consistency standpoint.

After reflection, I don't think this is true. We always supported sparse vector indexes - I don't see how we could have avoided it, really. It seems to me this bug was introduced here and it was released as part of 9.8, meaning that releases 9.8 and later in the 9.x series will be able to read indexes produced in 9.0 but will give meaningless results for HNSW searches over those indexes. This seems like something we maybe ought to make the user community aware of.

I'm just going to leave this here for comment for a bit, but soon I'll draft an actual email describing the situation and any mitigations (upgrade your software to 10? Rewrite your index with a 9.1-9.7 release?)

msokolov · 2024-10-22T14:21:19Z

ok something like this:

Dear Lucene user community,

We recently uncovered a backwards compatibility bug that affects indexes created with version 9.0 containing KNN vector fields. Versions 9.8 - 9.12 are unable to search vectors in such indexes correctly and will return incorrect results without raising any error. We think it's likely very few if any of you are using 9.0 indexes, but if you are, possible mitigation steps are:

Upgrade to 10.0 or later, or
Do not upgrade past 9.7, or
If you must use an affected Lucene version (9.8-9.12) and you have 9.0-written indexes including KNN vector fields, you must recreate those indexes from source with your current Lucene version.

benwtrent · 2024-10-22T16:33:44Z

@msokolov could we do a simpler patch for 9.12.1?

msokolov · 2024-10-22T17:17:28Z

Yes, maybe we should -- I think it would be a one-liner

msokolov · 2024-10-22T17:19:58Z

There is another upgrade path -- if you started with 9.0 and then "upgraded" your index by rewriting it (eg with IndexUpdater tool) via merge to 9.1-9.7 you could subsequently read the index with later versions. But this seemed kind of complex to explain for a case that probably doesn't exist.

mikemccand · 2024-10-23T14:46:46Z

This seems like something we maybe ought to make the user community aware of.

+1 thanks @msokolov.

@msokolov could we do a simpler patch for 9.12.1?

+1. 9.12.1 would also contain the fix for the other (horrible, caused by me!) bwc break specifically for LPVQ quantized vector indices.

Backwards compatibility is hard!!

benwtrent · 2024-10-23T16:08:56Z

@mikemccand I have a PR open for this bug fix for 9.12. Will merge soon.

Could you add a CHANGES entry in 9.12 for your bug fix for 9.12.1?

mikemccand · 2024-10-31T15:11:56Z

Could you add a CHANGES entry in 9.12 for your bug fix for 9.12.1?

Ahh yes sorry I will do that today!

Add BaseKnnVectorsFormatTestCase.testRecall() and fix map ord to doc …

b581275

…in Lucene90HnswVectorsReader

msokolov force-pushed the test_knn_recall branch from 9d8cd9b to b581275 Compare October 14, 2024 21:49

benwtrent self-requested a review October 16, 2024 13:03

benwtrent reviewed Oct 16, 2024

View reviewed changes

msokolov commented Oct 16, 2024

View reviewed changes

benwtrent reviewed Oct 16, 2024

View reviewed changes

Michael Sokolov added 4 commits October 17, 2024 08:17

Add BaseKnnVectorsFormatTestCase.testRecall() and fix map ord to doc …

df712f5

…in Lucene90HnswVectorsReader

handle stray prints

dd9b77e

test all similarities and more queries

bee57d8

refactor assertRecall method

ac85cf2

benwtrent approved these changes Oct 17, 2024

View reviewed changes

msokolov merged commit 3983fa2 into apache:main Oct 17, 2024
3 checks passed

msokolov deleted the test_knn_recall branch October 17, 2024 15:50

benwtrent mentioned this pull request Oct 23, 2024

Fix ord-to-doc mapping when searching Lucene 9.0.0 hnsw indices #13947

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs #13910

Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs #13910

msokolov commented Oct 14, 2024

msokolov commented Oct 14, 2024 •

edited

Loading

benwtrent left a comment

benwtrent Oct 16, 2024

msokolov Oct 16, 2024

mikemccand Oct 23, 2024

benwtrent Oct 16, 2024

benwtrent Oct 16, 2024

benwtrent Oct 16, 2024

msokolov Oct 16, 2024

benwtrent Oct 16, 2024

benwtrent Oct 16, 2024

msokolov Oct 16, 2024

benwtrent Oct 16, 2024

msokolov Oct 16, 2024

msokolov left a comment

msokolov Oct 16, 2024

msokolov Oct 16, 2024

msokolov Oct 16, 2024

msokolov Oct 16, 2024

benwtrent Oct 16, 2024

msokolov commented Oct 17, 2024

benwtrent left a comment

msokolov commented Oct 21, 2024 •

edited

Loading

msokolov commented Oct 22, 2024

benwtrent commented Oct 22, 2024

msokolov commented Oct 22, 2024

msokolov commented Oct 22, 2024

mikemccand commented Oct 23, 2024

benwtrent commented Oct 23, 2024

mikemccand commented Oct 31, 2024

		// indexed 421 lines from LICENSE.txt
		// indexed 157 lines from NOTICE.txt

		KnnFloatVectorQuery exactQuery =
		new KnnFloatVectorQuery("field", queryEmbedding, 1000, new MatchAllDocsQuery());

Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs #13910

Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs #13910

Conversation

msokolov commented Oct 14, 2024

msokolov commented Oct 14, 2024 • edited Loading

benwtrent left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msokolov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msokolov commented Oct 17, 2024

benwtrent left a comment

Choose a reason for hiding this comment

msokolov commented Oct 21, 2024 • edited Loading

msokolov commented Oct 22, 2024

benwtrent commented Oct 22, 2024

msokolov commented Oct 22, 2024

msokolov commented Oct 22, 2024

mikemccand commented Oct 23, 2024

benwtrent commented Oct 23, 2024

mikemccand commented Oct 31, 2024

msokolov commented Oct 14, 2024 •

edited

Loading

msokolov commented Oct 21, 2024 •

edited

Loading