Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UMAP fit/transform approach #9

Open
amarrerod opened this issue Oct 25, 2022 · 3 comments
Open

UMAP fit/transform approach #9

amarrerod opened this issue Oct 25, 2022 · 3 comments

Comments

@amarrerod
Copy link

Hi! @LTLA

I've been searching through the examples of your UMAP library and I was wondering if there is any option to use it in a fit/transform way similar to the UMAP library in python.

I mean, loading a dataset to compute the embeding and then using a transform method to transform new samples into the existing embedded space and get the transformed output.

Thank you so much!

@LTLA
Copy link
Collaborator

LTLA commented Oct 25, 2022

Probably not, I don't remember adding this. I would need to have a look at how uwot does it. Might be pretty simple if it's just a weighted average of neighbors; a PR would be welcome.

@jlmelville
Copy link

Transforming a new point involves:

  • finding the nearest neighbors from the old points, so you need to store the index that was built during the initial construction.
  • constructing the fuzzy set memberships values with respect to those neighbors, i.e. the similarities. You also adjust the local connectivity constraint here (practically this just means you don't shift the exponential in the similarity calculation, rho always equals zero).
  • initializing the coordinates of the new point in the low-dimensional space by an average of the coordinates of the nearest neighbors (so the low dimensional coordinates of the original points must also be stored). Or maybe a weighted average using the similarities? Uwot can do both: one of them is the Python UMAP way and one is something I added to see if it made a difference, but I can't remember which is which. I don't think it has turned out to be important.
  • optimizing the coordinates. This is the same gradient descent as with the usual layout optimization but you DO NOT want the original coordinates to change. Therefore you need a scheme to keep track of which nodes are which as part of the edge list which is used in the optimization. Also the learning rate is scaled down.

transform.R in uwot is a bit of a disaster but mainly due to trying to maintain backwards compatibility and allow for an ever-increasing number of ways to provide input data. If you ignore all that then apart from setting up the edge list appropriately there isn't a lot of special casing for transforming and the structure of the smooth knn and optimization C++ code works as-is.

@jlmelville
Copy link

Oh but also be aware of jlmelville/uwot#103 which I have been unable to reproduce but may indicate a bug somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants