-
-
Notifications
You must be signed in to change notification settings - Fork 24
/
README.Rmd
333 lines (251 loc) · 11.7 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include=FALSE}
stopifnot(require(knitr))
options(width = 90)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
dev = "png",
dpi = 150,
fig.asp = 0.8,
fig.width = 5,
out.width = "60%",
fig.align = "center"
)
```
# posterior <img src="man/figures/stanlogo.png" align="right" width="120" />
<!-- badges: start -->
[![CRAN
status](https://www.r-pkg.org/badges/version/posterior)](https://CRAN.R-project.org/package=posterior)
[![R-CMD-check](https://github.com/stan-dev/posterior/workflows/R-CMD-check/badge.svg)](https://github.com/stan-dev/posterior/actions?workflow=R-CMD-check)
[![Coverage
Status](https://codecov.io/gh/stan-dev/posterior/branch/master/graph/badge.svg)](https://app.codecov.io/gh/stan-dev/posterior)
<!-- badges: end -->
The **posterior** R package is intended to provide useful tools for both users
and developers of packages for fitting Bayesian models or working with output
from Bayesian models. The primary goals of the package are to:
* Efficiently convert between many different useful formats of
draws (samples) from posterior or prior distributions.
* Provide consistent methods for operations commonly performed on draws,
for example, subsetting, binding, or mutating draws.
* Provide various summaries of draws in convenient formats.
* Provide lightweight implementations of state of the art posterior inference
diagnostics.
If you are new to **posterior** we recommend starting with these vignettes:
* [*The posterior R package*](https://mc-stan.org/posterior/articles/posterior.html):
an introduction to the package and its main functionality
* [*rvar: The Random Variable Datatype*](https://mc-stan.org/posterior/articles/rvar.html):
an overview of the new random variable datatype
### Installation
You can install the latest official release version via
```{r install_cran, eval=FALSE}
install.packages("posterior")
```
or build the developmental version directly from GitHub via
```{r install_github, eval=FALSE}
# install.packages("remotes")
remotes::install_github("stan-dev/posterior")
```
### Examples
Here we offer a few examples of using the package. For a more detailed overview
see the vignette [*The posterior R package*](https://mc-stan.org/posterior/articles/posterior.html).
```{r load}
library("posterior")
```
To demonstrate how to work with the **posterior** package, we will use example
posterior draws obtained from the eight schools hierarchical meta-analysis model
described in Gelman et al. (2013). Essentially, we have an estimate per school
(`theta[1]` through `theta[8]`) as well as an overall mean (`mu`) and standard
deviation across schools (`tau`).
#### Draws formats
```{r draws_array}
eight_schools_array <- example_draws("eight_schools")
print(eight_schools_array, max_variables = 3)
```
The draws for this example come as a `draws_array` object, that is, an array
with dimensions iterations x chains x variables. We can easily transform it to
another format, for instance, a data frame with additional meta information.
```{r draws_df}
eight_schools_df <- as_draws_df(eight_schools_array)
print(eight_schools_df)
```
Different formats are preferable in different situations and hence posterior
supports multiple formats and easy conversion between them. For more details on
the available formats see `help("draws")`. All of the formats are essentially
base R object classes and can be used as such. For example, a `draws_matrix`
object is just a `matrix` with a little more consistency and additional methods.
#### Summarizing draws
Computing summaries of posterior or prior draws and convergence diagnostics for
posterior draws is one of the most common tasks when working with Bayesian
models fit using Markov Chain Monte Carlo (MCMC) methods. The **posterior**
package provides a flexible interface for this purpose via `summarise_draws()`:
```{r summary}
# summarise_draws or summarize_draws
summarise_draws(eight_schools_df)
```
Basically, we get a data frame with one row per variable and one column per
summary statistic or convergence diagnostic. The summaries `rhat`, `ess_bulk`,
and `ess_tail` are described in Vehtari et al. (2020). We can choose which
summaries to compute by passing additional arguments, either functions or names
of functions. For instance, if we only wanted the mean and its corresponding
Monte Carlo Standard Error (MCSE) we would use:
```{r summary-with-measures}
summarise_draws(eight_schools_df, "mean", "mcse_mean")
```
For a function to work with `summarise_draws`, it needs to take a vector or
matrix of numeric values and returns a single numeric value or a named vector of
numeric values.
#### Subsetting draws
Another common task when working with posterior (or prior) draws, is subsetting
according to various aspects of the draws (iterations, chains, or variables).
**posterior** provides a convenient interface for this purpose via the
`subset_draws()` method. For example, here is the code to extract the first five
iterations of the first two chains of the variable `mu`:
```{r subset}
subset_draws(eight_schools_df, variable = "mu", chain = 1:2, iteration = 1:5)
```
The same call to `subset_draws()` can be used regardless of whether the object
is a `draws_df`, `draws_array`, `draws_list`, etc.
#### Mutating and renaming draws
The magic of having obtained draws from the joint posterior (or prior)
distribution of a set of variables is that these draws can also be used
to obtain draws from any other variable that is a function of the original variables.
That is, if are interested in the posterior distribution of, say,
`phi = (mu + tau)^2` all we have to do is to perform the transformation for each
of the individual draws to obtain draws from the posterior distribution of the
transformed variable. This procedure is automated in the `mutate_variables` method:
```{r}
x <- mutate_variables(eight_schools_df, phi = (mu + tau)^2)
x <- subset_draws(x, c("mu", "tau", "phi"))
print(x)
```
When we do the math ourselves, we see that indeed for each draw,
`phi` is equal to `(mu + tau)^2` (up to rounding two 2 digits
for the purpose of printing).
We may also easily rename variables, or even entire vectors of variables via
`rename_variables`, for example:
```{r}
x <- rename_variables(eight_schools_df, mean = mu, alpha = theta)
variables(x)
```
As with all **posterior** methods, `mutate_variables` and `rename_variables`
can be used with all draws formats.
#### Binding draws together
Suppose we have multiple draws objects that we want to bind together:
```{r}
x1 <- draws_matrix(alpha = rnorm(5), beta = 1)
x2 <- draws_matrix(alpha = rnorm(5), beta = 2)
x3 <- draws_matrix(theta = rexp(5))
```
Then, we can use the `bind_draws` method to bind them along different dimensions.
For example, we can bind `x1` and `x3` together along the `'variable'` dimension:
```{r}
x4 <- bind_draws(x1, x3, along = "variable")
print(x4)
```
Or, we can bind `x1` and `x2` together along the `'draw'` dimension:
```{r}
x5 <- bind_draws(x1, x2, along = "draw")
print(x5)
```
As with all **posterior** methods, `bind_draws` can be used with all draws
formats.
#### Converting from regular R objects to draws formats
The `eight_schools` example already comes in a format natively supported by
posterior but we could of course also import the draws from other sources,
for example, from common base R objects:
```{r draws_matrix}
x <- matrix(rnorm(50), nrow = 10, ncol = 5)
colnames(x) <- paste0("V", 1:5)
x <- as_draws_matrix(x)
print(x)
summarise_draws(x, "mean", "sd", "median", "mad")
```
Instead of `as_draws_matrix()` we also could have just used `as_draws()`, which
attempts to find the closest available format to the input object. In this case
this would result in a `draws_matrix` object either way.
The above matrix example contained only one chain. Multi-chain draws could be
stored in base R 3-D array object, which can also be converted to a draws object:
```{r}
x <- array(data=rnorm(200), dim=c(10, 2, 5))
x <- as_draws_matrix(x)
variables(x) <- paste0("V", 1:5)
print(x)
```
#### Converting from mcmc objects to draws formats
The **coda** and **rjags** packages use `mcmc` and `mcmc.list` objects which
can also be converted to draws objects:
```{r}
data(line, package = "coda")
line <- as_draws_df(line)
print(line)
```
### Contributing to posterior
We welcome contributions! The **posterior** package is under active development.
If you find bugs or have ideas for new features (for us or yourself to
implement) please open an issue on GitHub
(https://github.com/stan-dev/posterior/issues).
### Citing posterior
Developing and maintaining open source software is an important yet often
underappreciated contribution to scientific progress. Thus, whenever you are
using open source software (or software in general), please make sure to cite it
appropriately so that developers get credit for their work.
When using **posterior**, please cite it as follows:
* Bürkner P. C., Gabry J., Kay M., & Vehtari A. (2020). “posterior: Tools for
Working with Posterior Distributions.” R package version XXX, <URL:
https://mc-stan.org/posterior/>.
When using the MCMC convergence diagnostics `rhat`, `ess_bulk`, `ess_tail`,
`ess_median`, `ess_quantile`, `mcse_median`, or `mcse_quantile`
please also cite
* Vehtari A., Gelman A., Simpson D., Carpenter B., & Bürkner P. C. (2021).
Rank-normalization, folding, and localization: An improved Rhat for assessing
convergence of MCMC (with discussion). *Bayesian Analysis*. 16(2), 667–718.
doi.org/10.1214/20-BA1221
When using the MCMC convergence diagnostic `rhat_nested`
please also cite
* Margossian, C. C., Hoffman, M. D., Sountsov, P., Riou-Durand, L.,
Vehtari, A., and Gelman, A. (2024).
Nested $\widehat{R}$: Assessing the convergence of Markov chain
Monte Carlo when running many short chains. *Bayesian Analysis*,
doi:10.1214/24-BA1453.
When using the MCMC convergence diagnostic `rstar`
please also cite
* Lambert, B. and Vehtari, A. (2022). $R^*$: A robust MCMC convergence
diagnostic with uncertainty using decision tree classifiers.
*Bayesian Analysis*, 17(2):353-379.
doi:10.1214/20-BA1252
When using the Pareto-k diagnostics `pareto_khat`, `pareto_min_ss`,
`pareto_convergence_rate`, `khat_threshold` or `pareto_diags`, or
Pareto smoothing `pareto_smooth` please also cite
* Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).
Pareto smoothed importance sampling.
*Journal of Machine Learning Research*, 25(72):1-58.
The same information can be obtained by running `citation("posterior")`.
### References
Gelman A., Carlin J. B., Stern H. S., David B. Dunson D. B., Aki Vehtari A.,
& Rubin D. B. (2013). *Bayesian Data Analysis, Third Edition*. Chapman and
Hall/CRC.
Lambert, B. and Vehtari, A. (2022). $R^*$: A robust MCMC convergence
diagnostic with uncertainty using decision tree classifiers.
*Bayesian Analysis*, 17(2):353-379.
doi:10.1214/20-BA1252
Margossian, C. C., Hoffman, M. D., Sountsov, P., Riou-Durand, L.,
Vehtari, A., and Gelman, A. (2024).
Nested $\widehat{R}$: Assessing the convergence of Markov chain
Monte Carlo when running many short chains. *Bayesian Analysis*,
doi:10.1214/24-BA1453.
Vehtari A., Gelman A., Simpson D., Carpenter B., & Bürkner P. C. (2021).
Rank-normalization, folding, and localization: An improved Rhat for assessing
convergence of MCMC (with discussion). *Bayesian Analysis*. 16(2), 667–718.
doi.org/10.1214/20-BA1221
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).
Pareto smoothed importance sampling.
*Journal of Machine Learning Research*, 25(72):1-58.
### Licensing
The **posterior** package is licensed under the following licenses:
- Code: BSD 3-clause (https://opensource.org/license/bsd-3-clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)