-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
umap_transform uses a different distance metric if loaded in #117
Comments
Thanks for the report, there might be a bug here, but I'll need to do some checking. Just to follow up on your last question now: if you use the the correlation distance, then the underlying Annoy calculation uses the cosine distance. This is because the correlation distance is equivalent to the cosine distance after mean-centering each row. So the |
Thanks for the swift reply and clarification! If there is anything I can do to assist, let me know. I just double-checked the NN idx and dist output using the fresh and loaded-in versions of the same model in the transform function, and I do get the same indexed neighbours but the distances are different. The results should be the same and not a problem for the embeddings, but I was hoping to utilise the NN correlations. the fresh model:
and the loaded in version:
if we look at the first few lines of the verbose output from the transform function with the former:
but with the latter:
|
For what it's worth - if I change the loaded in model nn_index metric to 'correlation' I restore the behaviour from a fresh model, returning correlation values from the transform function. |
@mdrnao yes, this is definitely a bug and I just pushed a fix, so it will be fixed in the next release of uwot. Although I don't know if this is feasible for your workflow, doing what you did by changing the Thank you for the assistance in tracking down what was happening and apologies for the oversight. |
Ah thank you so much! The work around is absolutely fine for me. Good luck with the new version, and thanks again for your support. |
Hi - firstly thanks for an excellent package!
I am currently using
umap
with correlation as the distance metric, then saving it for future use. However, when I useumap_transform
with the saved umap model, andret_extra = "nn"
, I find that it reports cosine as the distance metric. When I use the transform function on the umap model without saving and reloading in between, correlation is the reported NN metric.For the "fresh" umap model:
model$metric 'correlation'
model$nn_index$metric 'correlation'
for the loaded in version:
model$metric 'correlation'
model$nn_index$metric 'cosine'
I noticed in the
load_uwot
function you've hard codedif (metric == "correlation") {annoy_metric <- "cosine"}
and I was curious why?Thanks,
Holly
The text was updated successfully, but these errors were encountered: