Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't load a "euclidean" index from hnsw_build #21

Open
jlmelville opened this issue Mar 11, 2024 · 0 comments
Open

Can't load a "euclidean" index from hnsw_build #21

jlmelville opened this issue Mar 11, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@jlmelville
Copy link
Owner

jlmelville commented Mar 11, 2024

Build a Euclidean index via hnsw_build:

irism <- as.matrix(iris[, -5])
ann <- hnsw_build(irism, distance = "euclidean")
iris_nn <- hnsw_search(irism, ann, k = 5)
head(iris_nn$dist)
     [,1]      [,2]      [,3]      [,4]      [,5]
[1,]    0 0.1000000 0.1414212 0.1414212 0.1414213
[2,]    0 0.1414213 0.1414213 0.1414213 0.1732050
[3,]    0 0.1414213 0.2449490 0.2645751 0.2645753
[4,]    0 0.1414215 0.1732051 0.2236071 0.2449490
[5,]    0 0.1414212 0.1414213 0.1732050 0.1732050
[6,]    0 0.3316623 0.3464102 0.3605552 0.3741659

So far so good. Now save it:

ann$save("iris.hnsw")

The class of ann is:

class(ann)
[1] "Rcpp_HnswL2"
attr(,"package")
[1] "RcppHNSW"

so we should be able to load it with:

ann2 <- methods::new(RcppHNSW::HnswL2, 4, "iris.hnsw")

Now search again:

iris_nn2 <- hnsw_search(irism, ann2, k = 5)
head(iris_nn2$dist)
     [,1]       [,2]       [,3]       [,4]       [,5]
[1,]    0 0.01000000 0.01999996 0.01999996 0.01999998
[2,]    0 0.01999998 0.01999998 0.01999998 0.02999999
[3,]    0 0.01999998 0.06000003 0.07000001 0.07000010
[4,]    0 0.02000003 0.03000002 0.05000012 0.06000003
[5,]    0 0.01999996 0.01999998 0.02999996 0.02999999
[6,]    0 0.10999985 0.12000003 0.13000003 0.14000012

This is just the L2 distances (as the class name suggests).

So after saving and reloading a formerly Euclidean index, you must manually convert from L2 distances.

Fix for this will probably be to introduce a dedicated RcppHNSW::HnswEuclidean class which will do the square-rooting for you inside a method. This will be returned from hnsw_build when distance = "euclidean".

@jlmelville jlmelville added the bug Something isn't working label Mar 11, 2024
@jlmelville jlmelville self-assigned this Mar 11, 2024
jlmelville added a commit that referenced this issue Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant