-
-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] for GMM implement gmm_full covariance type? #48
Comments
@FMKerckhof let me have a look into this because from a first look it seems it requires a cpp template to either return "gmm_full" or "gmm_diag" model (I'll have more time on Sunday afternoon to see if this is feasible and make the adjustments). A few parameters work either with "gmm_full" or "gmm_diag" based on the armadillo documentation |
I modified the "GMM()" function and now it takes an additional parameter "full_covariance_matrices" which is set to FALSE so that diagonal covariance matrices are returned by default. If this parameter is TRUE then full covariance matrices will be returned. However, there is a difference in the dimensions of the "covariance_matrices" output object. In case of diagonal covariance matrices the output object is a matrix whereas in case of full covariance matrix the output is a 3-dimensional object, require(ClusterR)
data(dietary_survey_IBS)
dat = as.matrix(dietary_survey_IBS[, -ncol(dietary_survey_IBS)])
dat = center_scale(dat)
# diagonal covariance matrices
gmm = GMM(data = dat,
gaussian_comps = 3,
full_covariance_matrices = FALSE,
verbose = TRUE)
str(gmm)
# List of 5
# $ call : language GMM(data = dat, gaussian_comps = 3, verbose = TRUE, full_covariance_matrices = FALSE)
# $ centroids : num [1:3, 1:42] 0.182 -0.472 0.585 0.527 -0.603 ...
# $ covariance_matrices: num [1:3, 1:42] 0.7439 0.2761 1.4364 1.5807 0.0649 ...
# $ weights : num [1:3] 0.141 0.5 0.359
# $ Log_likelihood : num [1:400, 1:3] -61.2 -61.6 -71.6 -72.4 -58.9 ...
# - attr(*, "class")= chr [1:2] "GMMCluster" "Gaussian Mixture Models"
# full covariance matrices
gmm_f = GMM(data = dat,
gaussian_comps = 3,
full_covariance_matrices = TRUE,
verbose = TRUE)
str(gmm_f)
# List of 5
# $ call : language GMM(data = dat, gaussian_comps = 3, verbose = TRUE, full_covariance_matrices = TRUE)
# $ centroids : num [1:3, 1:42] 0.15 -0.472 0.626 0.535 -0.603 ...
# $ covariance_matrices: num [1:42, 1:42, 1:3] 0.7333 -0.0758 0.0868 0.0951 0.0306 ...
# $ weights : num [1:3] 0.162 0.5 0.338
# $ Log_likelihood : num [1:400, 1:3] -109.6 -52.2 -175.7 -116.5 -49.5 ...
# - attr(*, "class")= chr [1:2] "GMMCluster" "Gaussian Mixture Models"
That means the "predict_GMM()" function needs adjustments (especially the Rcpp function) to return the log likelihoods, probabilities and clusters. I wrote this package back in 2017 and since then I haven't reviewed the literature related to GMM. I would also accept a PR for the predict function adjustment. remotes::install_github('mlampros/ClusterR', upgrade = 'always', dependencies = TRUE, repos = 'https://cloud.r-project.org/')
|
Thanks @mlampros , much appreciated! W.r.t. the PR for the predict_GMM function: while I am a fairly competent R programmer, my Rcpp knowledge is next-to-none. I will fork and see how far I can get. |
Hi,
Currently GMM implements the ’gmm_diag’ class of the Armadillo library - however, 'gmm_full' is sensible for many data types. Could this functionality be added? I am aware this will increase computational complexity, but most likely it will still be faster than MClust due to the Armadillo library link.
Is this something that is difficult to implement? Was it a design choice not to include it? It would be really nice to have the option.
thanks!
The text was updated successfully, but these errors were encountered: