Skip to content

Commit

Permalink
jlmelville#31: ensure dimensionality of Annoy index is stored when us…
Browse files Browse the repository at this point in the history
…ing all columns.
  • Loading branch information
jlmelville committed Feb 18, 2020
1 parent c435877 commit b7977fa
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 5 deletions.
9 changes: 8 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ to be deleted. Previously, both `load_uwot` and `save_uwot` were attempting to
delete the temporary working directories they used, but would alway silently
fail because Annoy is making use of files in those directories.


## Bug fixes and minor improvements

* New behavior when `n_epochs = 0`. This used to behave like (`n_epochs = NULL`)
Expand All @@ -46,6 +45,14 @@ with `unload_uwot`.
* `load_uwot` also returns the model with a `mod_dir` item for use with
`unload_uwot`.
* `save_uwot` and `load_uwot` were not correctly handling relative paths.
* A previous bug fix to `load_uwot` in uwot 0.1.4 to work with newer versions
of RcppAnnoy (<https://github.com/jlmelville/uwot/issues/31>) failed in the
typical case of a single metric for the nearest neighbor search using all
available columns, giving an error message along the lines of:
`Error: index size <size> is not a multiple of vector size <size>`. This has now
been fixed, but required changes to both `save_uwot` and `load_uwot`, so
existing saved models must be regenerated. Thank you to reporter
[OuNao](https://github.com/OuNao).


# uwot 0.1.5
Expand Down
24 changes: 21 additions & 3 deletions R/uwot.R
Original file line number Diff line number Diff line change
Expand Up @@ -1669,8 +1669,6 @@ uwot <- function(X, n_neighbors = 15, n_components = 2, metric = "euclidean",
res <- append(res, list(
scale_info = attr_to_scale_info(X),
n_neighbors = n_neighbors,
# Can't use nn descent during transform, so if used in training,
# double the Annoy search parameter to compensate
search_k = search_k,
local_connectivity = local_connectivity,
n_epochs = n_epochs,
Expand All @@ -1693,6 +1691,15 @@ uwot <- function(X, n_neighbors = 15, n_components = 2, metric = "euclidean",
}
else {
res$nn_index <- nns[[1]]$index
if (is.null(res$metric[[1]])) {
# 31: Metric usually lists column indices or names, NULL means use all
# of them, but for loading the NN index we need the number of
# columns explicitly (we don't have access to the column dimension of
# the input data at load time)
# To be sure of the dimensionality, fetch the first item from the
# index and see how many elements are in the returned vector.
res$metric[[1]] <- list(ndim = length(res$nn_index$getItemsVector(0)))
}
}
if (!is.null(pca_models)) {
res$pca_models <- pca_models
Expand Down Expand Up @@ -1913,7 +1920,18 @@ load_uwot <- function(file, verbose = FALSE) {
}
metric <- metrics[[i]]
# 31: need to specify the index dimensionality when creating the index
ann <- create_ann(metric, ndim = length(model$metric[[i]]))
if (is.list(model$metric[[i]])) {
# in case where there is only one metric, the value is a one-item list
# named 'ndim' giving the number of dimensions directly: all columns
# are used in this metric
ndim <- model$metric[[i]]$ndim
}
else {
# otherwise, metric specifies the name or index used for each metric,
# so the dimension is the number of them
ndim = length(model$metric[[i]])
}
ann <- create_ann(metric, ndim = ndim)
ann$load(nn_fname)
if (n_metrics == 1) {
model$nn_index <- ann
Expand Down
3 changes: 2 additions & 1 deletion tests/testthat/test_output.R
Original file line number Diff line number Diff line change
Expand Up @@ -397,7 +397,8 @@ expect_ok_matrix(res$embedding, nr = 4)

res_test <- umap_transform(iris10[5:10, ], res, verbose = FALSE, n_epochs = 10)
expect_ok_matrix(res_test, nr = 6)

# 31: ensure we store the ndim for single-metric models
expect_equal(res$metric$euclidean$ndim, 4)

# taus88 prng
res <- umap(iris10,
Expand Down

0 comments on commit b7977fa

Please sign in to comment.