Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if batch = TRUE in training, fgraph of umap_transform is transposed if n_epoch is set to 0 #118

Open
PedroMilanezAlmeida opened this issue Mar 11, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@PedroMilanezAlmeida
Copy link

on my system, this is a reproducible example:

iris_train <- iris[1:100,1:4]
iris_test <- iris[101:150,1:4]

iris_train_umap <- uwot::umap(iris_train, 
                              ret_model = TRUE,
                              batch = TRUE)

dim(uwot::umap_transform(iris_test, 
                         iris_train_umap,
                         n_epochs = NULL,
                         ret_extra = "fgraph")$fgraph)
[1] 100  50

dim(uwot::umap_transform(iris_test, 
                         iris_train_umap,
                         n_epochs = 0,
                         ret_extra = "fgraph")$fgraph)
[1]  50 100

R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLAS;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] microbenchmark_1.4.10 compiler_4.3.2        RcppAnnoy_0.0.21      Matrix_1.6-4          tools_4.3.2           rstudioapi_0.15.0    
 [7] Rcpp_1.0.11           codetools_0.2-19      grid_4.3.2            snedata_0.0.0.9000    lattice_0.22-5        uwot_0.1.16
@PedroMilanezAlmeida
Copy link
Author

PedroMilanezAlmeida commented Mar 11, 2024

fgraph of umap_transform also transposed if batch = TRUE and n_epochs = NULL

iris_train_umap <- uwot::umap(iris_train, 
                              ret_model = TRUE,
                              batch = FALSE)

iris_test_umap <- uwot::umap_transform(iris_test, 
                                       iris_train_umap,
                                       n_epochs = NULL,
                                       ret_extra = "fgraph")
dim(iris_test_umap$fgraph)
[1]  50 100

iris_train_umap <- uwot::umap(iris_train, 
                              ret_model = TRUE,
                              batch = TRUE)

iris_test_umap <- uwot::umap_transform(iris_test, 
                                       iris_train_umap,
                                       n_epochs = NULL,
                                       ret_extra = "fgraph")
dim(iris_test_umap$fgraph)
[1] 100  50

@jlmelville jlmelville added the bug Something isn't working label Mar 12, 2024
@jlmelville
Copy link
Owner

Thank you for the report! This was due to trying to avoid doing unnecessary processing when the UMAP optimization won't occur (but some of it actually is necessary either way). Should be fixed in the dev version now.

@jlmelville
Copy link
Owner

Will be fixed in the next CRAN release

@PedroMilanezAlmeida
Copy link
Author

amazing, thank you!

@PedroMilanezAlmeida
Copy link
Author

fgraph of umap_transform also transposed if batch = TRUE and n_epochs = NULL

iris_train_umap <- uwot::umap(iris_train, 
                              ret_model = TRUE,
                              batch = FALSE)

iris_test_umap <- uwot::umap_transform(iris_test, 
                                       iris_train_umap,
                                       n_epochs = NULL,
                                       ret_extra = "fgraph")
dim(iris_test_umap$fgraph)
[1]  50 100

iris_train_umap <- uwot::umap(iris_train, 
                              ret_model = TRUE,
                              batch = TRUE)

iris_test_umap <- uwot::umap_transform(iris_test, 
                                       iris_train_umap,
                                       n_epochs = NULL,
                                       ret_extra = "fgraph")
dim(iris_test_umap$fgraph)
[1] 100  50

The code above gives the transposed fgraph in my system again (the only difference in that batch is FALSE in the first uwot::umap call and TRUE in the second).

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLAS;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] uwot_0.2.2   Matrix_1.7-0

loaded via a namespace (and not attached):
[1] compiler_4.4.1    RcppAnnoy_0.0.22  tools_4.4.1       rstudioapi_0.16.0 Rcpp_1.0.12       codetools_0.2-20 
[7] grid_4.4.1        lattice_0.22-6

@jlmelville
Copy link
Owner

Yeah I messed this one up and results were inconsistent between batch = TRUE and batch = FALSE.

@PedroMilanezAlmeida I think I should have been clearer about my understanding of what the correct dimensions for fgraph is. Where there were 100 training points and 50 test points like in your code I think the most sensible way to arrange the dimensions for fgraph should be (50, 100) and not (100, 50). Do you agree? If so, I got this completely the wrong way round with the original "fix".

@jlmelville
Copy link
Owner

@PedroMilanezAlmeida I've pushed a change (and added more tests) that should ensure that the output dim for fgraph would always be (50, 100). Let me know if that's contrary to your expectations, otherwise I will eventually close this. Not sure when the next release of uwot will be.

@PedroMilanezAlmeida
Copy link
Author

PedroMilanezAlmeida commented Sep 7, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants