From a959b9a2c3df7bf2d504fe76c66f49b517bb22e4 Mon Sep 17 00:00:00 2001 From: James Melville Date: Sun, 26 Nov 2023 12:01:10 -0800 Subject: [PATCH] Fix some URLs --- NEWS.md | 2 +- README.md | 21 ++++++++++----------- vignettes/articles/abparams.Rmd | 2 +- vignettes/articles/fast-sgd.Rmd | 2 +- vignettes/articles/init.Rmd | 2 +- vignettes/articles/leopold.Rmd | 12 ++++++------ vignettes/articles/lvish.Rmd | 2 +- vignettes/articles/pycompare.Rmd | 4 ++-- 8 files changed, 23 insertions(+), 24 deletions(-) diff --git a/NEWS.md b/NEWS.md index f5179853..90ed0679 100644 --- a/NEWS.md +++ b/NEWS.md @@ -140,7 +140,7 @@ coordinates. This is an approximation to the `dens_weight` will use a larger range of output densities to reflect the input data. If the data is too spread out, reduce the value of `dens_weight`. For more information see the -[documentation at the uwot repo](https://jlmelville.github.io/uwot/leopold.html). +[documentation at the uwot repo](https://jlmelville.github.io/uwot/articles/leopold.html). * New parameter: `binary_edge_weights`. If set to `TRUE`, instead of smoothed knn distances, non-zero edge weights all have a value of 1. This is how [PaCMAP](https://www.jmlr.org/papers/v22/20-1061.html) works and there is diff --git a/README.md b/README.md index 59f587b5..8696a3b8 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@ [![R-CMD-check](https://github.com/jlmelville/uwot/workflows/R-CMD-check/badge.svg)](https://github.com/jlmelville/uwot/actions) [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/jlmelville/uwot?branch=master&svg=true)](https://ci.appveyor.com/project/jlmelville/uwot) -[![Coverage Status](https://img.shields.io/codecov/c/github/jlmelville/uwot/master.svg)](https://codecov.io/github/jlmelville/uwot?branch=master) +[![Coverage Status](https://img.shields.io/codecov/c/github/jlmelville/uwot/master.svg)](https://app.codecov.io/github/jlmelville/uwot?branch=master) [![CRAN Status Badge](http://www.r-pkg.org/badges/version/uwot)](https://cran.r-project.org/package=uwot) [![Dependencies](https://tinyverse.netlify.com/badge/uwot)](https://cran.r-project.org/package=uwot) [![CRAN Monthly Downloads](https://cranlogs.r-pkg.org/badges/uwot)](https://cran.r-project.org/package=uwot) @@ -43,7 +43,7 @@ dimensional fuzzy simplicial set. submission). Among other things you can now pass your own nearest neighbors data in sparse matrix form. Also there is an option to reproduce relative cluster density by -[approximating the densMAP method](https://jlmelville.github.io/uwot/leopold.html). +[approximating the densMAP method](https://jlmelville.github.io/uwot/articles/leopold.html). See the [NEWS](https://github.com/jlmelville/uwot/blob/master/NEWS.md#uwot-0113) page for more. @@ -59,8 +59,7 @@ changes. *December 15 2020* Version 0.1.10 has been released to CRAN. This is mainly to maintain compatibility with RcppAnnoy, but also a small change was made to avoid it grinding away pointlessly in the presence of `NA` values, based on -an observation by -[David McGaughey on Twitter](https://twitter.com/David_McGaughey/status/1328389091239501824). +an observation by David McGaughey on Twitter (which I can no longer link to). *November 15 2020* Version 0.1.9 has been released to CRAN. The main addition is support for the Pearson correlation. Also, a slight license change from GPL-3 @@ -85,7 +84,7 @@ using `std::thread` rather than tinythread++. issues that originate from RcppAnnoy and RcppParallel. I am hopeful that the Annoy behavior is fixed and a suitable version of RcppAnnoy will be released onto CRAN eventually. The RcppParallel issues originate with the use of -[tbb](https://github.com/intel/tbb) and seems much harder to deal with. As there +[tbb](https://github.com/oneapi-src/oneTBB) and seems much harder to deal with. As there is no way to use RcppParallel without tbb yet, I am temporarily replacing the use of RcppParallel with just a subset of the code needed to run parallel for loops with the [tinythread++](https://tinythreadpp.bitsnbites.eu/) library. @@ -226,7 +225,7 @@ iris_umap_batch <- umap(iris, batch = TRUE, opt_args = list(beta1 = 0.9, beta2 = ## Documentation -. +. ## A Note on Reproducibility @@ -315,7 +314,7 @@ The right hand image is the result of using `uwot`. |-----------------------------------|---------------------------------| | ![mnist-py.png](man/figures/readme/mnist-py.png) | ![mnist-r.png](man/figures/readme/mnist-r.png) | -The project documentation contains some more [examples](https://jlmelville.github.io/uwot/umap-examples.html). +The project documentation contains some more [examples](https://jlmelville.github.io/uwot/articles/umap-examples.html). ## Performance @@ -368,7 +367,7 @@ approximation to the `pow` function suggested by [Martin Ankerl](https://martin. and the squared distance (`0`-`1000`), I found the maximum relative error was about `0.06`. However, I haven't done much testing, beyond looking to see that results from the -[examples page](https://jlmelville.github.io/uwot/umap-examples.html) are not +[examples page](https://jlmelville.github.io/uwot/articles/umap-examples.html) are not obviously worsened. Results in the table above with `approx_pow = TRUE` do show a worthwhile improvement. @@ -432,7 +431,7 @@ scheduler. This is the same behavior as largeVis. that Annoy itself says it works best with dimensions < 100, but still works "surprisingly well" up to 1000. * Experience with -[COIL-100](http://www.cs.columbia.edu/CAVE/software/softlib/coil-100.php), +[COIL-100](https://www.cs.columbia.edu/CAVE/software/softlib/coil-100.php), which has 49,152 features, suggests that Annoy will *definitely* struggle with datasets of this dimensionality. I strongly recommend using the `pca` option to reduce the dimensionality, e.g `pca = 100`. @@ -548,7 +547,7 @@ mnist_lv <- lvish(mnist, kernel = "knn", perplexity = 15, n_epochs = 1500, init = "lvrand", verbose = TRUE) ``` -See the [lvish examples](https://jlmelville.github.io/uwot/lvish.html) page for +See the [lvish examples](https://jlmelville.github.io/uwot/articles/lvish.html) page for more results. ## Mixed Data Types @@ -913,7 +912,7 @@ If you want to cite the use of uwot, then use the output of running [publication](https://arxiv.org/abs/1802.03426). * There is now a [UMAP package on CRAN](https://cran.r-project.org/package=umap) (see also its [github repo](https://github.com/tkonopka/umap)). -* Another R package is [umapr](https://github.com/ropenscilabs/umapr), but it is +* Another R package is [umapr](https://github.com/ropensci-archive/umapr), but it is no longer being maintained. * [umappp](https://github.com/LTLA/umappp) is a full C++ implementation, and [yaumap](https://github.com/LTLA/yaumap) provides an R wrapper. The batch diff --git a/vignettes/articles/abparams.Rmd b/vignettes/articles/abparams.Rmd index f6000c10..93c70972 100644 --- a/vignettes/articles/abparams.Rmd +++ b/vignettes/articles/abparams.Rmd @@ -11,7 +11,7 @@ output: This is part of the documentation for [uwot](https://github.com/jlmelville/uwot). If you look at the UMAP -[examples](https://jlmelville.github.io/uwot/umap-examples.html), it's clear +[examples](https://jlmelville.github.io/uwot/articles/umap-examples.html), it's clear that the default settings aren't always appropriate for some datasets: it's easy to get results where the clusters are very spaced out relative to their sizes, which makes viewing your data on a single static plot quite difficult diff --git a/vignettes/articles/fast-sgd.Rmd b/vignettes/articles/fast-sgd.Rmd index 6d0d2ef6..4ca0cad1 100644 --- a/vignettes/articles/fast-sgd.Rmd +++ b/vignettes/articles/fast-sgd.Rmd @@ -66,7 +66,7 @@ mnist_umap_fast <- umap(mnist, pca = 100, fast_sgd = TRUE, verbose = TRUE) Six threads were used in the stochastic gradient descent. For details on the datasets, see the -[examples](https://jlmelville.github.io/uwot/umap-examples.html) page. The +[examples](https://jlmelville.github.io/uwot/articles/umap-examples.html) page. The timings are given in the title, in minutes and seconds. Note that this is for entire run, not just the optimization phase, i.e. it includes the PCA dimensionality reduction and nearest neighbor search, which is usually the diff --git a/vignettes/articles/init.Rmd b/vignettes/articles/init.Rmd index 78d5f753..992b20eb 100644 --- a/vignettes/articles/init.Rmd +++ b/vignettes/articles/init.Rmd @@ -106,7 +106,7 @@ embedding <- umap(data, init = "lvrand") Below, we'll explore the effect of these settings. For more details on the datasets, see the -[examples](https://jlmelville.github.io/uwot/umap-examples.html) page. +[examples](https://jlmelville.github.io/uwot/articles/umap-examples.html) page. Apart from changing `init`, mainly default settings where used, except using `pca = 100`, which is necessary with high dimensional datasets for the diff --git a/vignettes/articles/leopold.Rmd b/vignettes/articles/leopold.Rmd index b3c5f1ed..e36f70cf 100644 --- a/vignettes/articles/leopold.Rmd +++ b/vignettes/articles/leopold.Rmd @@ -394,12 +394,12 @@ for leopold) and that the values for $\sigma$ and $\rho$ are calculated *before* the symmetrization of the UMAP input edge weights. So these values are missing the influence of observations outside the initial k-nearest neighborhood graph. -For the output radii, I know from looking at the [output weight function -parameters](https://jlmelville.github.io/uwot/abparams.html), that increasing -the `a` parameter makes the clusters in a UMAP plot shrink, and vice versa. So -let's use that as measure of the output density, i.e. the inverse of the local -radius. As every point has its own radius, the value for a given weight between -points $i$ and $j$ will be the geometric mean of the two radii: +For the output radii, I know from looking at the +[output weight function parameters](https://jlmelville.github.io/uwot/articles/umap-examples.html), +that increasing the `a` parameter makes the clusters in a UMAP plot shrink, and +vice versa. So let's use that as measure of the output density, i.e. the inverse +of the local radius. As every point has its own radius, the value for a given +weight between points $i$ and $j$ will be the geometric mean of the two radii: $$ w_{ij} = 1 / \left(1 + \frac{d_{ij}^2}{\sqrt{r_i r_j}} \right) diff --git a/vignettes/articles/lvish.Rmd b/vignettes/articles/lvish.Rmd index 74989290..0375d49f 100644 --- a/vignettes/articles/lvish.Rmd +++ b/vignettes/articles/lvish.Rmd @@ -17,7 +17,7 @@ to the [LargeVis](https://arxiv.org/abs/1602.00370) method (see also its For details on the datasets, and to compare with the output of UMAP and t-SNE, see the -[UMAP examples gallery](https://jlmelville.github.io/uwot/umap-examples.html). +[UMAP examples gallery](https://jlmelville.github.io/uwot/articles/umap-examples.html). ## Gaussian Perplexity diff --git a/vignettes/articles/pycompare.Rmd b/vignettes/articles/pycompare.Rmd index 46dcb88e..0bfe473d 100644 --- a/vignettes/articles/pycompare.Rmd +++ b/vignettes/articles/pycompare.Rmd @@ -26,7 +26,7 @@ refer to the Python implementation with the formatting `UMAP`. For details on the datasets, see the -[UMAP examples gallery](https://jlmelville.github.io/uwot/umap-examples.html). +[UMAP examples gallery](https://jlmelville.github.io/uwot/articles/umap-examples.html). The `uwot` results have been re-run for the images here using different random seeds, but they should resemble the `uwot` output on that page. @@ -202,7 +202,7 @@ methods considered here. ![cifar10 r](img/pycompare/cifar10_r.png)|![cifar10 py](img/pycompare/cifar10_py.png) Based on results with this dataset in the -[UMAP examples gallery](https://jlmelville.github.io/uwot/umap-examples.html), +[UMAP examples gallery](https://jlmelville.github.io/uwot/articles/umap-examples.html), I wasn't expecting these results to be a feast for the eyes. After seeing the difference in results with `macosko2015`, I am just relieved that these two results look disappointing in a similar way.