Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uwot with distance matrix impossible to retain embedding #62

Open
luciat-92 opened this issue Jul 2, 2020 · 4 comments
Open

uwot with distance matrix impossible to retain embedding #62

luciat-92 opened this issue Jul 2, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@luciat-92
Copy link

Hello James,
thanks a lot for the extremely useful implementation. I am interested in using the umap function providing directly the distance matrix. I was wondering if it would be possible to extend the option ret_model = T using this kind of input or from a implementation point of view is not feasible.
Thanks!

@jlmelville
Copy link
Owner

ret_model isn’t supported with a distance matrix because it’s primarily intended for transforming new data and you need the actual original input data to find the distances between them and the new data to be transformed.

There is a use case where I think it would be feasible (but not implemented currently) that you would be able to use the model to transform new data if you also provided a distance matrix between the original data and the new points. If that’s the case, it seems unusual to have full distance matrices available but not the underlying data: I’d be curious to know the domain the data comes from if you can say.

@luciat-92
Copy link
Author

I am training a machine learning model on a fixed data and space using UMAP. In order to compute the UMAP embedding, I am using distances and not the original data as a non linear combination of different input formats. Then I want to add new data points as test set without recomputing the embedding space since that has to remain fixed for the machine learning to be applicable on external data.
For the test set I would have the information of the distances of each point in the test with respect to the train.
For this reason, that implementation would be really useful for my case. Do you think it would be possible?

@jlmelville jlmelville added the enhancement New feature or request label Jul 3, 2020
@jlmelville
Copy link
Owner

jlmelville commented Jul 3, 2020

Ok, that sounds like it would be possible but I can't say when (or if) it will get done.

@jlmelville
Copy link
Owner

@luciat-92 does #64 cover your use case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants