CRAN release 0.1.8
uwot 0.1.8
Better late than never, here are the release notes for CRAN release 0.1.8. It's a bumper selection due to my failure to get 0.1.6 and 0.1.7 accepted.
New features
- New parameter,
ret_extra
, a vector which can contain any combination of:"model"
(same asret_model = TRUE
),"nn"
(same asret_nn = TRUE
) andfgraph
(see below). - New return value data: If the
ret_extra
vector contains"fgraph"
, the returned list will contain anfgraph
item representing the fuzzy simplicial input graph as a sparse N x N matrix. Forlvish
, use"P"
instead of"fgraph
" (#47). Note that there is a further sparsifying step where edges with a very low membership are removed if there is no prospect of the edge being sampled during optimization. This is controlled byn_epochs
: the smaller the value, the more sparsifying will occur. If you are only interested in the fuzzy graph and not the embedded coordinates, setn_epochs = 0
. - New function:
unload_uwot
, to unload the Annoy nearest neighbor indices in a model. This prevents the model from being used inumap_transform
, but allows for the temporary working directory created by bothsave_uwot
andload_uwot
to be deleted. Previously, bothload_uwot
andsave_uwot
were attempting to delete the temporary working directories they used, but would always silently fail because Annoy is making use of files in those directories. - An attempt has been made to reduce the variability of results due to different compiler and C++ library versions on different machines. Visually results are unchanged in most cases, but this is a breaking change in terms of numerical output. The best chance of obtaining floating point determinism across machines is to use
init = "spca"
, fixed values ofa
andb
(rather than allowing them to be calculated through settingmin_dist
andspread
) andapprox_pow = TRUE
. Using thetumap
method withinit = "spca"
is probably the most robust approach.
Big fixes and minor improvements
- default for
n_threads
is nowNULL
to provide a bit more protection from changing dependencies. - uwot should no longer trigger undefined behavior in sanitizers, due to replacement of RcppParallel with the standard C++11 implementation of threading (and some code "borrowed" from RcppParallel) (#52).
- Further sanitizer improvements in the nearest neighbor search code due to the upstream efforts of erikbern and eddelbuettel (#50).
- New behavior when
n_epochs = 0
. This used to behave like (n_epochs = NULL
) and gave a default number of epochs (dependent on the number of vertices in the dataset). Now it more usefully carries out all calculations except optimization, so the returned coordinates are those specified by theinit
parameter, so this is an easy way to access e.g. the spectral or PCA initialization coordinates. If you want the input fuzzy graph (ret_extra
vector contains"fgraph"
), this will also prevent the graph having edges with very low membership being removed.
You still get the old default epochs behavior by settingn_epochs = NULL
or to a negative value. save_uwot
andload_uwot
have been updated with averbose
parameter so it's easier to see what temporary files are being created.save_uwot
has a new parameter,unload
, which if set toTRUE
will delete the working directory for you, at the cost of unloading the model, i.e. it can't be used withumap_transform
until you reload it withload_uwot
.save_uwot
now returns the saved model with an extra field,mod_dir
, which points to the location of the temporary working directory, so you should now assign the result of callingsave_uwot
to the model you saved, e.g.model <- save_uwot(model, "my_model_file")
. This field is intended for use withunload_uwot
.load_uwot
also returns the model with amod_dir
item for use withunload_uwot
.save_uwot
andload_uwot
were not correctly handling relative paths.- A previous bug fix to
load_uwot
in uwot 0.1.4 to work with newer versions of RcppAnnoy (#31) failed in the typical case of a single metric for the nearest neighbor search using all available columns, giving an error message along the lines of:Error: index size <size> is not a multiple of vector size <size>
. This has now been fixed, but required changes to bothsave_uwot
andload_uwot
, so existing saved models must be regenerated. Thank you to reporter OuNao.