-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request: UMAP connectivity and diagnostic plotting #65
Comments
This would be really great! |
I agree some kind of diagnostic plotting is necessary for any dimensionality method which embeds a neighbor graph. I have written substantial amounts of R (and Python) plotting code for visualizing UMAP output but I don't really want to add it to uwot because I think it would result in a drastic increase in the maintenance burden. Also I admit to being a bit of a skeptic that connectivity plots are that useful for static output. For interactive plotting it's a different matter, I think they are very informative there. But I am not sure what would constitute a useful contribution. plotly is adequate for my needs. Seems like I could end up having to support multiple output styles (e.g. base That said it's a bit hypocritical of me to say that diagnostic plotting is necessary and then resolutely refuse to provide any help. |
I think the reason why a static connectivity plot is helpful is because it shows you which distances are actually meaningful on a standard 2D umap. E.g. 2 clusters may sit equally close to a third cluster but only one of them is close due to contentedness, thus meaningful, the other may only end up at the same distance because of the dimensionality compression/reduction. I understand and agree that implementing different plotting frameworks can cause a large burden, but it may not be necessary. |
Agreed about the intention. I suppose I should try and implement it and then be prepared to eat my words. |
My initial experiments with connectivity plotting have confirmed my suspicions that without access to something that works like datashader (which the Python connectivity plotter makes use of), the naïve approach of plotting lines between the As an alternative, I considered plotting just the connections between the furthest nearest neighbor of each point. Closer neighbors are more likely to be embedded closer to the point so you would probably see a higher proportion of uninteresting within-cluster lines. Here's what this looks like for That looks ok, although I should stress that I have zero evidence that displaying the further nearest neighbor distance gives useful information about clusters or connectivity. But And here are the 15-neighbor connectivities (the equivalent of the I still don't consider that static output to be all that useful, and don't actually have a way to produce an equivalent interactive plot for this yet. The very simplified method of producing those connections may also be misleading or unhelpful. A more sophisticated method processing all the neighbor connectivities to leave only the "useful" ones seems like a substantial research project on its own. Not sure when or if I will pursue this further, but if you are able to get to the data in a form that lets you use conn_plot <-
function(model,
X,
alpha_scale = 0.5,
color = "black",
lwd = 1,
nn = NULL) {
X <- uwot:::x2m(X)
if (is.null(nn)) {
if (!is.null(model$nn)) {
nn <- model$nn[[1]]
}
else {
nn <-
uwot:::annoy_search(X, k = model$n_neighbors, ann = model$nn_index)
}
}
nnf <- nn$idx[, model$n_neighbors, drop = FALSE]
pairs <- as.matrix(reshape2::melt(nnf)[, c(1, 3)])
coords <- model$embedding
x0 <- coords[pairs[, 1], 1]
y0 <- coords[pairs[, 1], 2]
x1 <- coords[pairs[, 2], 1]
y1 <- coords[pairs[, 2], 2]
segments(
x0 = x0,
y0 = y0,
x1 = x1,
y1 = y1,
col = grDevices::adjustcolor(color, alpha.f = alpha_scale),
lwd = lwd
)
} Example of using it with # ret_nn = TRUE is optional but strongly recommended
model <- umap(iris, ret_model = TRUE, ret_nn = TRUE)
plot(model$embedding, col=iris$Species)
# or vizier::embed_plot(model$embedding, iris)
conn_plot(model, iris, alpha_scale = 0.1) Note:
|
Thank you! |
https://schochastics.github.io/edgebundle/ seems worth exploring |
It would be great and super useful to have the connectivity or diagnostic plotting features seen in the python UMAP implementation.
Thanks for the great work!
The text was updated successfully, but these errors were encountered: