-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
181 lines (144 loc) · 5.31 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
---
title: "BayesMultiMode"
output: github_document
bibliography: inst/REFERENCES.bib
---
```{r, include = FALSE}
knitr::opts_chunk$set(
message = FALSE,
warning = FALSE,
dpi = 600,
fig.path = "man/figures/README-"
)
```
<!-- badges: start -->
[![R-CMD-check](https://github.com/paullabonne/BayesMultiMode/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/paullabonne/BayesMultiMode/actions/workflows/R-CMD-check.yaml)
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/BayesMultiMode)](https://cran.r-project.org/package=BayesMultiMode)
<!-- badges: end -->
`BayesMultiMode` is an R package for detecting and exploring multimodality using Bayesian techniques. The approach works in two stages. First, a mixture distribution is fitted on the data using a sparse finite mixture Markov chain Monte Carlo (SFM MCMC) algorithm. The number of mixture components does not have to be known; the size of the mixture is estimated endogenously through the SFM approach. Second, the modes of the estimated mixture in each MCMC draw are retrieved using algorithms specifically tailored for mode detection. These estimates are then used to construct posterior probabilities for the number of modes, their locations and uncertainties, providing a powerful tool for mode inference. See @basturk_2023 and @Cross2024 for more details.
### Installing BayesMultiMode from CRAN
```{r, eval = FALSE}
install.packages("BayesMultiMode")
```
### Or installing the development version from GitHub
```{r, eval = FALSE}
# install.packages("devtools") # if devtools is not installed
devtools::install_github("paullabonne/BayesMultiMode")
```
### Loading BayesMultiMode
```{r}
library(BayesMultiMode)
```
### BayesMultiMode for MCMC estimation and mode inference
`BayesMultiMode` provides a very flexible and efficient MCMC estimation approach : it handles mixtures with unknown number of components through the sparse finite mixture approach of @malsiner-walli_model-based_2016 and supports a comprehensive range of mixture distributions, both continuous and discrete.
#### Estimation
```{r, out.width = '70%', fig.align = "center", warning = FALSE, message = FALSE}
set.seed(123)
# retrieve galaxy data
y <- galaxy
# estimation
bayesmix <- bayes_fit(
data = y,
K = 10,
dist = "normal",
nb_iter = 2000,
burnin = 1000,
print = F
)
plot(bayesmix, draws = 200)
```
#### Mode inference
```{r, out.width = '70%', fig.align = "center"}
# mode estimation
bayesmode <- bayes_mode(bayesmix)
plot(bayesmode)
summary(bayesmode)
```
### BayesMultiMode for mode inference with external MCMC output
`BayesMultiMode` also works on MCMC output generated using external software. The function `bayes_mixture()` creates an object of class `bayes_mixture` which can then be used as input in the mode inference function `bayes_mode()`. Here is an example using cyclone intensity data [@knapp_international_2018] and the `BNPmix` package for estimation. More examples can be found [here](https://github.com/paullabonne/BayesMultiMode/blob/main/external_comp.md).
```{r, out.width = '70%', fig.align = "center", warning = FALSE, message = FALSE}
library(BNPmix)
library(dplyr)
y <- cyclone %>%
filter(
BASIN == "SI",
SEASON > "1981"
) %>%
dplyr::select(max_wind) %>%
unlist()
## estimation
PY_result <- PYdensity(y,
mcmc = list(
niter = 2000,
nburn = 1000,
print_message = FALSE
),
output = list(out_param = TRUE)
)
```
#### Transforming the output into a mcmc matrix with one column per variable
```{r, message = FALSE}
mcmc_py <- list()
for (i in 1:length(PY_result$p)) {
k <- length(PY_result$p[[i]][, 1])
draw <- c(
PY_result$p[[i]][, 1],
PY_result$mean[[i]][, 1],
sqrt(PY_result$sigma2[[i]][, 1]),
i
)
names(draw)[1:k] <- paste0("eta", 1:k)
names(draw)[(k + 1):(2 * k)] <- paste0("mu", 1:k)
names(draw)[(2 * k + 1):(3 * k)] <- paste0("sigma", 1:k)
names(draw)[3 * k + 1] <- "draw"
mcmc_py[[i]] <- draw
}
mcmc_py <- as.matrix(bind_rows(mcmc_py))
```
#### Creating an object of class `bayes_mixture`
```{r}
py_BayesMix <- bayes_mixture(
mcmc = mcmc_py,
data = y,
burnin = 0, # the burnin has already been discarded
dist = "normal",
vars_to_keep = c("eta", "mu", "sigma")
)
```
#### Plotting the mixture
```{r, out.width = '70%', fig.align = "center"}
plot(py_BayesMix)
```
#### Mode inference
```{r, out.width = '70%', fig.align = "center"}
# mode estimation
bayesmode <- bayes_mode(py_BayesMix)
# plot
plot(bayesmode)
# Summary
summary(bayesmode)
```
### BayesMultiMode for mode estimation in mixtures estimated with ML
It is possible to use `BayesMultiMode` to find modes in mixtures estimated using maximum likelihood and the EM algorithm. Below is an example using the popular package `mclust`. More examples can be found [here](https://github.com/paullabonne/BayesMultiMode/blob/main/external_comp.md).
```{r, out.width = '50%', fig.align = "center"}
set.seed(123)
library(mclust)
y <- cyclone %>%
filter(
BASIN == "SI",
SEASON > "1981"
) %>%
dplyr::select(max_wind) %>%
unlist()
fit <- Mclust(y)
pars <- c(
eta = fit$parameters$pro,
mu = fit$parameters$mean,
sigma = sqrt(fit$parameters$variance$sigmasq)
)
mix <- mixture(pars, dist = "normal", range = c(min(y), max(y))) # create new object of class Mixture
modes <- mix_mode(mix) # estimate modes
plot(modes)
summary(modes)
```
### References