Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep the rownames #81

Closed
jwijffels opened this issue Aug 18, 2021 · 7 comments
Closed

keep the rownames #81

jwijffels opened this issue Aug 18, 2021 · 7 comments
Labels
bug Something isn't working

Comments

@jwijffels
Copy link

I'm using the package here https://github.com/bnosac/ETM and here https://github.com/bnosac/word2vec for NLP purposes.
It would be great if the rownames could be kept. Is there a reason why they are dropped?

> library(uwot)
> embeddings <- matrix(runif(1000), nrow = 100, ncol = 10)
> rownames(embeddings) <- paste("word", seq_len(nrow(embeddings)))
> emb.2d <- umap(embeddings, n_components = 2, metric = "cosine", n_neighbors = 15)
> head(emb.2d)
           [,1]        [,2]
[1,]  0.4076209 -0.66765205
[2,] -0.8779236  1.09168144
[3,] -1.0821520 -1.73271100
[4,]  1.6955776 -0.17339508
[5,]  1.4383082  1.53947248
[6,] -0.5214175  0.06803199
> 
> rownames(emb.2d) <- rownames(embeddings)
> head(emb.2d)
             [,1]        [,2]
word 1  0.4076209 -0.66765205
word 2 -0.8779236  1.09168144
word 3 -1.0821520 -1.73271100
word 4  1.6955776 -0.17339508
word 5  1.4383082  1.53947248
word 6 -0.5214175  0.06803199
@jlmelville jlmelville added the bug Something isn't working label Aug 19, 2021
@jlmelville
Copy link
Owner

@jwijffels the only reason the row names were dropped was because I honestly never thought about it until you opened the issue. Thank you for reporting this.

The current master branch has the fix (it certainly works for your example). I don't know when I will be submitting a new release to CRAN, so you may want to install directly from github via the devtools or remotes packages.

@jwijffels
Copy link
Author

Thanks for the quick fixes already. What about adding some extra check to make sure the rownames are the same, namely at

uwot/R/uwot.R

Line 2166 in 0ad9eda

nblocks <- 1

if(!all.equal(rownames(X), rownames(nn_method$idx)){
  warning("rownames of the embedding matrix not the same as the rownames from nn_method")
}

@jlmelville
Copy link
Owner

@jwijffels I can add that (or something like it), although it could be a bit of an obscure corner case: uwot::umap will either take the row names from X, or if that is not present, from the pre-computed nearest neighbor data in nn_method. Usually, you would pass either X or pre-computed nearest neighbor data, but not both. The only exception I can immediately think of is if you wanted to use pre-computed neighbors, but also initialize the embedding using PCA.

Have you had a situation where you wanted to use X and pre-computed neighbors and both had row names and the names were inconsistent? If so, are you able to provide any details about how it occurs (e.g. there is a package that is using uwot in this way and handling the row names differently inside a function so it can't be managed by the user).

@jwijffels
Copy link
Author

I just wanted to avoid the warning message in the news file. My use cases will be mainly using uwot::umap passing on an embedding matrix with rownames and returning the model, and subsequently applying umap_transform with the model on a new set of embedding matrices containing different rownames. I think that flow is enough covered by the changes you made. Thanks again!

@jwijffels
Copy link
Author

Let me close this. Thanks for the fixes.

@jlmelville
Copy link
Owner

Thanks @jwijffels. There are a lot of possible combinations of types of data that can be passed into uwot::umap now so I can't be 100% sure I have covered all of them in terms of getting row names out, but I am pretty confident. The NEWS caveat was just a way to cover myself by warning that if the data was inconsistent the output is implementation dependent (which at this point could change if I found a hole that needed plugging).

@jwijffels
Copy link
Author

jwijffels commented Aug 25, 2021

There are a lot of possible combinations of types of data that can be passed into uwot::umap

I saw that when inspecting the code :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants