Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to input similarity_graph back into umap parameters? #98

Closed
Chengwei94 opened this issue Aug 26, 2022 · 3 comments
Closed

How to input similarity_graph back into umap parameters? #98

Chengwei94 opened this issue Aug 26, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@Chengwei94
Copy link

Hi there,

I am trying out similarity_graph to compute the connectivities graph. I am using it to compute clustering(similar to scanpy workflow). However, how do I input this connectivities information into the umap, so I can skip the recomputation? Or is there a way to retreive the similarity_graph when doing the umap?

@Chengwei94
Copy link
Author

Looks like I can get it the connectivity matrix through umap(mnist, ret_extra = c("fgraph"))

@jlmelville jlmelville added the enhancement New feature or request label Aug 26, 2022
@jlmelville
Copy link
Owner

You are correct that that the output of similarity_graph is the same as running umap with ret_extra = c("fgraph").

But the use case of calling similarity_graph and then passing it to umap and skipping all the computation is not something you can do at the moment. A workaround would be to use the k-nearest neighbors output:

sg_res <- similarity_graph(iris, ret_extra = "nn")
umap_res <- umap(X = NULL, nn_method = sg_res$nn)

This incurs the cost of similarity calculation and symmetrization, but that is quick compared to the nearest neighbor calculation itself.

Passing the result of similarity_graph back into umap seems like something that ought to be supported now that similarity_graph exists, especially as it would allow users to use either a modified version of the fuzzy simplicial set or even a sparse similarity matrix created via an entirely different method outside of uwot and then uwot can just be used to optimize the approximate coordinates in the lower dimension. So @Chengwei94 if you don't mind I would like to leave this issue open to remind me to support this in the next version of uwot.

This is not hard to implement, but the interface requires some thought: some questions to myself (or anyone with an interest in this): how should the user pass this to umap? The X parameter already assumes if its passed a sparse matrix that it's a distance matrix. X in combination with a is_similarity_graph parameter? Use nn_method instead? An entirely new parameter (and with it the need for ever more complex validation of which parameters are allowed together and which ones get ignored if they are both set)? An entirely new function (probably safest). While we're here, should the type of symmetrization also be specified by the user (e.g. fuzzy set union for UMAP vs mean average in LargeVis)?

@jlmelville
Copy link
Owner

optimize_graph_layout was added to uwot which will do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants