Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] for GMM implement gmm_full covariance type? #48

Open
FMKerckhof opened this issue Mar 30, 2023 · 3 comments
Open

[Feature Request] for GMM implement gmm_full covariance type? #48

FMKerckhof opened this issue Mar 30, 2023 · 3 comments
Labels

Comments

@FMKerckhof
Copy link

Hi,

Currently GMM implements the ’gmm_diag’ class of the Armadillo library - however, 'gmm_full' is sensible for many data types. Could this functionality be added? I am aware this will increase computational complexity, but most likely it will still be faster than MClust due to the Armadillo library link.

Is this something that is difficult to implement? Was it a design choice not to include it? It would be really nice to have the option.

thanks!

@mlampros
Copy link
Owner

@FMKerckhof let me have a look into this because from a first look it seems it requires a cpp template to either return "gmm_full" or "gmm_diag" model (I'll have more time on Sunday afternoon to see if this is feasible and make the adjustments). A few parameters work either with "gmm_full" or "gmm_diag" based on the armadillo documentation

diagonal_gmm

@mlampros
Copy link
Owner

mlampros commented Apr 2, 2023

I modified the "GMM()" function and now it takes an additional parameter "full_covariance_matrices" which is set to FALSE so that diagonal covariance matrices are returned by default. If this parameter is TRUE then full covariance matrices will be returned. However, there is a difference in the dimensions of the "covariance_matrices" output object. In case of diagonal covariance matrices the output object is a matrix whereas in case of full covariance matrix the output is a 3-dimensional object,

require(ClusterR)
data(dietary_survey_IBS)
dat = as.matrix(dietary_survey_IBS[, -ncol(dietary_survey_IBS)])
dat = center_scale(dat)

# diagonal covariance matrices
gmm = GMM(data = dat, 
          gaussian_comps = 3, 
          full_covariance_matrices = FALSE,
          verbose = TRUE)
str(gmm)
# List of 5
 # $ call               : language GMM(data = dat, gaussian_comps = 3, verbose = TRUE, full_covariance_matrices = FALSE)
 # $ centroids          : num [1:3, 1:42] 0.182 -0.472 0.585 0.527 -0.603 ...
 # $ covariance_matrices: num [1:3, 1:42] 0.7439 0.2761 1.4364 1.5807 0.0649 ...
 # $ weights            : num [1:3] 0.141 0.5 0.359
 # $ Log_likelihood     : num [1:400, 1:3] -61.2 -61.6 -71.6 -72.4 -58.9 ...
 # - attr(*, "class")= chr [1:2] "GMMCluster" "Gaussian Mixture Models"

# full covariance matrices
gmm_f = GMM(data = dat, 
          gaussian_comps = 3, 
          full_covariance_matrices = TRUE,
          verbose = TRUE)
str(gmm_f)
# List of 5
# $ call               : language GMM(data = dat, gaussian_comps = 3, verbose = TRUE, full_covariance_matrices = TRUE)
# $ centroids          : num [1:3, 1:42] 0.15 -0.472 0.626 0.535 -0.603 ...
# $ covariance_matrices: num [1:42, 1:42, 1:3] 0.7333 -0.0758 0.0868 0.0951 0.0306 ...
# $ weights            : num [1:3] 0.162 0.5 0.338
# $ Log_likelihood     : num [1:400, 1:3] -109.6 -52.2 -175.7 -116.5 -49.5 ...
# - attr(*, "class")= chr [1:2] "GMMCluster" "Gaussian Mixture Models"

That means the "predict_GMM()" function needs adjustments (especially the Rcpp function) to return the log likelihoods, probabilities and clusters. I wrote this package back in 2017 and since then I haven't reviewed the literature related to GMM. I would also accept a PR for the predict function adjustment.
The current changes can be installed using,

remotes::install_github('mlampros/ClusterR', upgrade = 'always', dependencies = TRUE, repos = 'https://cloud.r-project.org/')

@FMKerckhof
Copy link
Author

Thanks @mlampros , much appreciated! W.r.t. the PR for the predict_GMM function: while I am a fairly competent R programmer, my Rcpp knowledge is next-to-none. I will fork and see how far I can get.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants