-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
umap_transform can give odd results with dens_scale #103
Comments
That's definitely weird. Does the same thing happen if you remove the first 100 items rather than the last 100 items? Unfortunately I am unlikely to be able to investigate this for at least a couple of weeks, but I will try to take a look when I can. |
Yes, it doesn't matter which samples I remove, as long as by the end of it the projected dataset is of the same size or smaller as the dataset who's space it's being projected into. |
@FemkeSmit sorry for the delay is getting back to this. I am having trouble reproducing the problem with the datasets I have. Can you tell me what version of uwot you are running? Also, if you are able to install packages from github, would you be able to run the code below and let me know if you see the ring structure? devtools::install_github("jlmelville/snedata")
devtools::install_github("jlmelville/vizier")
mnist <- snedata::download_mnist()
mnist_train <- head(mnist, 60000)
mnist_test <- tail(mnist, 10000)
mnist_umap_test <- umap(mnist_test, ret_model = TRUE)
mnist_umap_train_transform <- umap_transform(mnist_train, mnist_umap_test)
vizier::embed_plot(mnist_umap_test$embedding, mnist_test, cex = 0.1, alpha_scale = 0.1, title = "10 000 model points")
vizier::embed_plot(mnist_umap_train_transform, mnist_train, cex = 0.1, alpha_scale = 0.1, title = "60 000 transformed points") These are the images I get, where the 60,000 MNIST training images in the second image are transformed using a model built with the smaller (10,000 image) test set. So it seems like there must be something else going on other than when the original dataset is smaller than the dataset passed to |
@jlmelville I'm running version 0.1.14 of uwot. A few coworkers of mine actually also ran into this issue - the ring being formed - with their dataset, and also managed to solve it by reducing the size of the dataset that was being transformed, so it's not an issue unique to me. Still, when I run your code the ring doesn't form, so I don't know what might be different between these cases. |
Another update: I just used your code for creating the UMAP object on my data, and now no ring formed.
Edit: I just tried using my original umap settings on the mnist dataset, but there no ring forms. I wouldn't know why. |
Ok, so the ring seems to be due to one or a combination of parameters. If you are able to continue helping me, can you try your umap parameters, but turn off the following parameters one at a time (i.e. re-run 4 times, each time with one of these removed):
I don't want to prejudge matters, but this is in decreasing order of suspicion (so I suspect it's |
The
I need to update the documentation around this. Also, in If using |
Anyway, as for the ring structure, I am still struggling a bit to generate something that looks like what you get. I do see a ring structure when embedding two overlapping Gaussians, where one has a much larger standard deviation than the other (but they have the same center). When So, could it be that the dataset you are using, when stratified by sex, results in data where the male subset has features with a substantially smaller variance (on average) than the female subset? I would expect that to be reflected in the I would like to understand a bit more about the data you are using: can you say how many columns the data has? At any rate, now that I know |
I edited the title of the issue to reflect my current understanding of what is going on |
My data consists of 10 numeric variables with about 1500 female samples and 1600 male samples. All variables have been normalized to have mean 0 and sd 1. Most variables follow a normal distribution, though some have a long tail, and one is vaguely binomial. All distributions are very similar between the male and female samples. I tried running the UMAP again with dens_scale = 1, but this time removing a bunch of samples from the dataset so there were now 1500 female samples and 1400 male samples, and then transforming into each other's space again. This time a ring formed around the female UMAP projection instead of the male UMAP projection. Removing both the last 100 samples: |
Thank you for the information. Seems that there isn't anything in your data that should cause the problem, so I went back to the MNIST example and I can now reproduce the ring structure, now I know Not sure exactly what is going on: depending on the seed, the session tends to die more often than not which unfortunately means debugging the C++. But this is sort-of reproducible so I will try to fix it. |
@FemkeSmit I found the error: arrays for the original and new data were swapped. The current development version of uwot has a fix. There will be some other pushes to document and test the fix but what is currently there should now work correctly. I'm sorry I failed to test this code path appropriately, and I appreciate the extensive help in hunting this down. |
No problem, I'm happy to help. Glad you managed to find and solve the issue! |
Hopefully there isn't much more to say on this, but I have also fixed two other issues that has arisen from this discussion:
|
This seems fixed |
I have a dataset consisting of ~1500 female samples and ~1600 male samples, both with the exact same variables and similar distributions between the two sexes. I've stratified this dataset based on sex and made a separate UMAP for them both. I then attempted to project the male dataset into the female UMAP space and vice versa using umap_transform, which worked fine for the female samples, and worked fine for most of the male samples, except that about 100 male samples got projected onto a ring surrounding the other datapoints, far away from them. I then reduced my male dataset to be the same size as the female dataset by removing the last 100 samples (the order of the samples is completely random) and this ring disappeared.
-->
The text was updated successfully, but these errors were encountered: