feature request: UMAP connectivity and diagnostic plotting #65

maddyduran · 2020-07-30T17:01:30Z

It would be great and super useful to have the connectivity or diagnostic plotting features seen in the python UMAP implementation.

Thanks for the great work!

vertesy · 2023-02-12T20:52:15Z

This would be really great!

jlmelville · 2023-02-15T06:54:55Z

I agree some kind of diagnostic plotting is necessary for any dimensionality method which embeds a neighbor graph. I have written substantial amounts of R (and Python) plotting code for visualizing UMAP output but I don't really want to add it to uwot because I think it would result in a drastic increase in the maintenance burden.

Also I admit to being a bit of a skeptic that connectivity plots are that useful for static output. For interactive plotting it's a different matter, I think they are very informative there. But I am not sure what would constitute a useful contribution. plotly is adequate for my needs. Seems like I could end up having to support multiple output styles (e.g. base graphics, ggplot2, plotly) and still not offer something that fits into most people's workflows or graphics needs.

That said it's a bit hypocritical of me to say that diagnostic plotting is necessary and then resolutely refuse to provide any help.

vertesy · 2023-02-15T09:45:07Z

I think the reason why a static connectivity plot is helpful is because it shows you which distances are actually meaningful on a standard 2D umap.

E.g. 2 clusters may sit equally close to a third cluster but only one of them is close due to contentedness, thus meaningful, the other may only end up at the same distance because of the dimensionality compression/reduction.

I understand and agree that implementing different plotting frameworks can cause a large burden, but it may not be necessary.

jlmelville · 2023-02-15T15:41:15Z

E.g. 2 clusters may sit equally close to a third cluster but only one of them is close due to contentedness, thus meaningful, the other may only end up at the same distance because of the dimensionality compression/reduction.

Agreed about the intention. I suppose I should try and implement it and then be prepared to eat my words.

jlmelville · 2023-02-19T20:40:01Z

My initial experiments with connectivity plotting have confirmed my suspicions that without access to something that works like datashader (which the Python connectivity plotter makes use of), the naïve approach of plotting lines between the n_neighbors nearest neighbors from the original space quickly scales beyond feasibility.

As an alternative, I considered plotting just the connections between the furthest nearest neighbor of each point. Closer neighbors are more likely to be embedded closer to the point so you would probably see a higher proportion of uninteresting within-cluster lines.

Here's what this looks like for iris:

That looks ok, although I should stress that I have zero evidence that displaying the further nearest neighbor distance gives useful information about clusters or connectivity.

But iris only contains 150 points. Here is a bog-standard UMAP of the MNIST digits (N = 70,000), a more realistic case:

And here are the 15-neighbor connectivities (the equivalent of the iris plot above):

I still don't consider that static output to be all that useful, and don't actually have a way to produce an equivalent interactive plot for this yet. The very simplified method of producing those connections may also be misleading or unhelpful. A more sophisticated method processing all the neighbor connectivities to leave only the "useful" ones seems like a substantial research project on its own.

Not sure when or if I will pursue this further, but if you are able to get to the data in a form that lets you use uwot directly on a matrix or dataframe (not sure how easy that is to extract from e.g. seurat workflows) you can play about with this yourself:

conn_plot <-
  function(model,
           X,
           alpha_scale = 0.5,
           color = "black",
           lwd = 1,
           nn = NULL) {
    X <- uwot:::x2m(X)
    if (is.null(nn)) {
      if (!is.null(model$nn)) {
        nn <- model$nn[[1]]
      }
      else {
        nn <-
          uwot:::annoy_search(X, k = model$n_neighbors, ann = model$nn_index)
      }
    }

    nnf <- nn$idx[, model$n_neighbors, drop = FALSE]
    pairs <- as.matrix(reshape2::melt(nnf)[, c(1, 3)])

    coords <- model$embedding

    x0 <- coords[pairs[, 1], 1]
    y0 <- coords[pairs[, 1], 2]

    x1 <- coords[pairs[, 2], 1]
    y1 <- coords[pairs[, 2], 2]

    segments(
      x0 = x0,
      y0 = y0,
      x1 = x1,
      y1 = y1,
      col = grDevices::adjustcolor(color, alpha.f = alpha_scale),
      lwd = lwd
    )
  }

Example of using it with iris:

# ret_nn = TRUE is optional but strongly recommended
model <- umap(iris, ret_model = TRUE, ret_nn = TRUE)
plot(model$embedding, col=iris$Species)
# or vizier::embed_plot(model$embedding, iris)
conn_plot(model, iris, alpha_scale = 0.1)

Note:

You need to have reshape2 installed.
You need to have plotted the initial dataset yourself separately, via something like plot. Something as simple as plot(model$embedding) but you'll need to workout point sizes, colors and so on.
On an MNIST-sized dataset, it the function takes a while to run because it has to find the nearest neighbors and then just plotting all those lines takes ages even after the function returns. Obviously caching the nearest neighbors would help here, which you can do by generating the original UMAP model with ret_nn = TRUE. Even then, be prepared to wait several minutes with seemingly nothing happening.

vertesy · 2023-02-26T15:09:38Z

Thank you!

jlmelville · 2024-02-01T15:53:45Z

https://schochastics.github.io/edgebundle/ seems worth exploring

jlmelville added the enhancement New feature or request label Jul 30, 2020

vertesy mentioned this issue Feb 12, 2023

UMAP connectivity plot satijalab/seurat#6898

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature request: UMAP connectivity and diagnostic plotting #65

feature request: UMAP connectivity and diagnostic plotting #65

maddyduran commented Jul 30, 2020

vertesy commented Feb 12, 2023

jlmelville commented Feb 15, 2023

vertesy commented Feb 15, 2023 •

edited

Loading

jlmelville commented Feb 15, 2023

jlmelville commented Feb 19, 2023

vertesy commented Feb 26, 2023

jlmelville commented Feb 1, 2024

feature request: UMAP connectivity and diagnostic plotting #65

feature request: UMAP connectivity and diagnostic plotting #65

Comments

maddyduran commented Jul 30, 2020

vertesy commented Feb 12, 2023

jlmelville commented Feb 15, 2023

vertesy commented Feb 15, 2023 • edited Loading

jlmelville commented Feb 15, 2023

jlmelville commented Feb 19, 2023

vertesy commented Feb 26, 2023

jlmelville commented Feb 1, 2024

vertesy commented Feb 15, 2023 •

edited

Loading