Skip to content

Commit

Permalink
Add t-umap images
Browse files Browse the repository at this point in the history
  • Loading branch information
jlmelville committed Dec 1, 2023
1 parent a69158c commit 07b8a5d
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 8 deletions.
Binary file added vignettes/articles/img/tumap/tumap.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/articles/img/tumap/umap.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
56 changes: 48 additions & 8 deletions vignettes/articles/tumap.Rmd
Original file line number Diff line number Diff line change
@@ -1,24 +1,64 @@
---
title: "t-UMAP"
resource_files:
- img/tumap/umap.png
- img/tumap/tumap.png
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

If you choose the UMAP curve parameters to be `a = 1` and `b = 1`, you get
back the Cauchy distribution used in
[t-Distributed Stochastic Neighbor Embedding](https://lvdmaaten.github.io/tsne/)
and [LargeVis](https://arxiv.org/abs/1602.00370). This also happens to
significantly simplify the gradient leading to a noticeable speed-up: for MNIST,
I saw the optimization time drop from 66 seconds to 18 seconds. The trade off is
that you will see larger, more spread-out clusters than with the typical UMAP
settings (they're still more compact than you see in t-SNE, however). To try
t-UMAP, use the `tumap` function:

```R
mnist_tumap <- tumap(mnist, n_neighbors = 15, verbose = TRUE)
significantly simplify the gradient leading to a noticeable speed-up.

For MNIST:

```{r install and download, eval = FALSE}
library(uwot)
# install snedata package from github
# pak::pkg_install("jlmelville/snedata")
mnist <- snedata::download_mnist()
```

I saw the optimization time drop from 66 seconds with UMAP:

```{r umap, eval = FALSE}
mnist_umap <- umap(mnist, n_neighbors = 15)
```


```{r, echo = FALSE, out.width = "75%", fig.cap = "MNIST UMAP"}
knitr::include_graphics("img/tumap/umap.png")
```

to 18 seconds with t-UMAP:

```{r tumap, eval = FALSE}
mnist_tumap <- tumap(mnist, n_neighbors = 15)
```

```{r, echo = FALSE, out.width = "75%", fig.cap = "MNIST t-UMAP"}
knitr::include_graphics("img/tumap/tumap.png")
```

You will still spend most of the time in the nearest neighbor search, so you
will really see a difference in terms of total time with larger values of
`n_epochs`. The trade off, as you can see, is that you will see larger, more
spread-out clusters than with the typical UMAP settings (they're still more
compact than you see in t-SNE, however). I think it's worth the trade-off.

Note that using `umap(a = 1, b = 1)` doesn't use the simplified gradient, so
you won't see any speed-up that way.

Some examples comparing UMAP and t-UMAP are in the
[examples](https://jlmelville.github.io/uwot/articles/umap-examples.html)
article.


0 comments on commit 07b8a5d

Please sign in to comment.