You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
New function: optimize_graph_layout. Use this to produce optimized output coordinates that
reflect an input similarity graph (such as that produced by the similarity_graph function. similarity_graph followed by optimize_graph_layout is the same as running umap, so the
purpose of these functions is to allow for more flexibility and decoupling between generating the
nearest neighbor graph and optimizing the low-dimensional approximation to it. Based on a request
by user Chengwei94 (#98).
New functions: simplicial_set_union and simplicial_set_intersect. These allow for the
combination of different fuzzy graph representations of a dataset into a single fuzzy graph using
the UMAP simplicial set operations. Based on a request in the Python UMAP issues tracker by user Dhar xion.
New parameter for umap_transform: ret_extra. This works like the equivalent parameter for umap, and should be a character vector specifying the extra information you would like returned
in addition to the embedding, in which case a list will be returned with an embedding member
containing the optimized coordinates. Supported values are "fgraph", "nn", "sigma" and "localr". Based on a request by user PedroMilanezAlmeida (#104).
New parameter from umap, tumap and umap_transform: seed. This will do the equivalent of
calling set.seed internally, and hence will help with reproducibility. The chosen seed is
exported if ret_model = TRUE and umap_transform will use that seed if present, so you only
need to specify it in umap_transform if you want to change the seed. The default behavior remains
to not modify the random number state. Based on a request by SuhasSrinivasan (#110).
Bug fixes and minor improvements
A new setting for init_sdev: set init_sdev = "range" and initial coordinates will be
range-scaled so each column takes values between 0-10. This pre-processing was added to the Python
UMAP package at some point after uwot began development and so should probably always be used
with the default init = "spectral" setting. However, it is not set by default to maintain
backwards compatibility with older versions of uwot.
ret_extra = c("sigma") is now supported by lvish. The Gaussian bandwidths are returned in a sigma vector. In addition, a vector of intrinsic dimensionalities estimated for each point using
an analytical expression of the finite difference method given by Lee and co-workers is returned in the dint vector.
The min_dist and spread parameters are now returned in the model when umap is run with ret_model = TRUE. This is just for documentation purposes, these values are not used directly by
the model in umap_transform. If the parameters a and b are set directly when invoking umap,
then both min_dist and spread will be set to NULL in the returned model. This feature was
added in response to a question from kjiang18 (#95).
Some new checks for NA values in input data have been added. Also a warning will be emitted if n_components seems to have been set too high.
If n_components was greater than n_neighbors then umap_transform would crash the R session.
Thank you to ChVav for reporting this (#102).
Using umap_transform with a model where dens_scale was set could cause a segmentation fault,
destroying the session. Even if it didn't it could give an entirely artifactual "ring" structure.
Thank you FemkeSmit for reporting this and providing
assistance in diagnosing the underlying cause (#103).
If you set binary_edge_weights = TRUE, this setting was not exported when ret_model = TRUE,
and was therefore not respected by umap_transform. This has now been fixed, but you will need to
regenerate any models that used binary edge weights.
The rdoc for the init param said that if there were multiple disconnected components, a
spectral initialization would attempt to merge multiple sub-graphs. Not true: actually, spectral
initialization is abandoned in favor of PCA. The documentation has been updated to reflect the true
state of affairs. No idea what I was thinking of there.
load_model and save_model didn't work on Windows 7 due to how the version of tar there
handles drive letters. Thank you mytarmail for the report (#109).
Warn if the initial coordinates have a very large scale (a standard deviation > 10.0), because
this can lead to small gradients and poor optimization. Thank you SuhasSrinivasan for the
report (#110).
A test was failing on Arm architectures. Problem has been "solved" by removing the test, but it
was testing a floating point value resulting from a failure due to numerical issues, so it's a bit
of a corner case. Thank you Lucas Kanashiro for reporting (#100).