-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supervised UMAP using already projected points as regression #712
Comments
There are two ways you can view this. The 2d coordinates could be viewed merely as target values, like labels but numeric instead of categorical. This can actually be handled just fine by setting Instead it sounds more like you want to fix some of the points to given locations in the embedding and then fit the rest of the points around that. This is actually being worked on right now. See #606 and #620 for discussion and work on that. There are some catches in exactly how to do this so I think your input into this would be most welcome. |
Thanks for the response, and sorry for my delayed post. You're right that fixing points is closer to what I need. But I can play around with the first suggestion, using a euclidean target_metric. Does that support semi-supervised cases? In the documentation (https://umap-learn.readthedocs.io/en/latest/supervised.html) you describe using -1 as a 'masked' value. Makes sense for categorical data, but how does that work for numerical data? |
The semi-supervised case is going to be an issue for other target_metrics since they won't specially handle "masked" values. In principle you can write your own custom metric that has special handling for masked values. I suspect in the long run this is going to really be doing what you want however. |
If checks pass on #620 is that sufficient to merge into the library? Or do you feel there is more to be done there? I had a look over the code in that PR, though admittedly a lot of the math flew over my head. |
There are some slightly more philosophical and API issues that need to get worked out before it gets merged, but you can certainly just check out the branch from the PR and use it -- it works; it just requires a little care from the user or unexpected results may occur. |
To expand on @lmcinnes API/philosopy comment: I started using #620, but now am favoring a more flexible API style, inspired by the PyMDE constraints objects. Passing UMAP constraint objects also can avoid some jit-related code duplication in #620. So one cleaner API might just add an optional constraint parameter, where constraint objects have a few standard In a first example, I've used a BTW, another issue (even in #620) is that UMAP
Now, when constraints come into play, |
Hey Leland,
Thanks for this great library, and for being so responsive with issues.
Question: Is it possible to train a (semi-)supervised UMAP model where some of the projections are already known/provided, as a regression task? This is contrast to using labels for supervision, which are categorical.
To provide an example, imagine I had a set of 100 embeddings. 50 of those embeddings have 2d coordinates associated with them; the other 50 do not. I want to be able to train a UMAP model on the 50 embeddings with 2d coordinates, and then run inference on the other 50 (or do a semi-supervised training and run on all of them at the same time).
If this isn't feasible with UMAP, do you know of any other algos/models that might be a good fit for what I'm suggesting (besides deep learning of course)?
Thanks! Amol
The text was updated successfully, but these errors were encountered: