Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample weights #40

Open
ilyakorsunsky opened this issue Dec 10, 2019 · 1 comment
Open

Sample weights #40

ilyakorsunsky opened this issue Dec 10, 2019 · 1 comment
Labels
enhancement New feature or request

Comments

@ilyakorsunsky
Copy link

I am curious to see whether there is a way to give individual observations different weights in the UMAP objective function. For instance, I have data from 2 conditions, one with 100 observations and one with 1000. I would like to have both conditions contribute equally to the embedding. Perhaps naively, I would expect observations from each conditions to take up the same amount of real estate in this balanced analysis. I appreciate any thoughts on how feasible this would be. Thanks in advance!

@jlmelville
Copy link
Owner

@ilyakorsunsky, it's an interesting question. Assuming your example is loosely based on real data you're working with, is the issue that you are seeing the data from the smaller subset (let's call it condition100) is being poorly clustered or is it forming a much smaller, denser cluster than condition1000? Assuming the underlying data has come cluster structure, although UMAP is insensitive to differences in data densities, it is affected by differences in the number of data points in each cluster -- the more data there is, the larger the cluster will be.

In general, most parts of UMAP are driven by interactions between pairs of points, so it's hard to intervene using a point-wise weighting. There is an (abitrary as far as I know) normalization factor when the smoothed knn distances are calculated, where it might be possible to do something, but I am wildly speculating at this point and would have to experiment.

For the general case, I would recommend opening an issue at https://github.com/lmcinnes/umap as Leland and other UMAP users may have already had thoughts on this.

@jlmelville jlmelville added the enhancement New feature or request label Feb 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants