mstan
, a compiler that implements a "swappable module" system for Stan. See this blog post or paper Multi-Model Probabilistic Programming for an introduction and check out the website for interactive visualizations of modular programs.model_search.py
, a simple proof-of-concept model search for the network of models.mstan-server
, the backend server for the Modular Stan website.
Please keep in mind that this is a research prototype. You are likely to encounter bugs and unimplemented features.
-
Install stack.
-
Make sure Graphviz is installed.
-
Run
stack install
. This should buildmstan
andmstan-server
and copy them to your PATH. -
To use
model_search.py
with ELPD evaluation, you'll needRscript
,python
,cmdstan
, and the following R dependencies: tidyverse, loo, abind, distributional, tensorA, jsonlite, data_table, cmdstanr, posterior.You can use
nix
to manage these dependencies with thenix-shell
command.nix-shell
will default to the appropriate environment specified by thedefault.nix
file. You may still need to installcmdstan
separately fromnix
.
You can explore mstan
usage by running mstan --help
:
Usage: mstan (-f|--modular-stan-file FILE) [-v|--debug-parse]
[-o|--output-file FILE] COMMAND
Execute model network command
Available options:
-f,--modular-stan-file FILE
File path of the input modular Stan file
-v,--debug-parse Show parsed modular program data structure
-o,--output-file FILE Output file path
-h,--help Show this help text
Available commands:
concrete-model Return the concrete Stan model given a model ID
model-graph Produce Graphviz image and text files of the model
graph of the modular Stan program.
module-graph Produce Graphviz image and text files of the module
graph of the modular Stan program.
model-neighbors Return the model IDs of the neighbors of the given
model
any-model Return an arbitrary model ID
list-all-models Return all model IDs
You can also use --help
for individual commands, like mstan concrete-model --help
.
Model IDs are strings that are used to uniquely reference individual models in a network. For example, the following string is the ID of a model from the "simple" example network:
Mean:standard,Stddev:lognormal,StddevInformative:yes
Model IDs are comma-separated lists of the module selections that make up that model; in this case Mean
is implemented by standard
and so on.
You can also find model ID of a selected model on the website in the text box above the module tree.
These command-line examples will use the "simple" example modular program.
Get an arbitrary model ID:
> mstan -f examples/simple.m.stan any-model
Mean:standard,Stddev:standard
Get the concrete Stan program for the model ID Mean:standard,Stddev:standard
:
> mstan -f examples/simple.m.stan concrete-model -s Mean:standard,Stddev:standard
data {
int N;
vector[N] x;
}
model {
x ~ normal(0, 1);
}
Get IDs of the models that neighbor Mean:standard,Stddev:standard
:
> mstan -f examples/simple.m.stan model-neighbors -s Mean:standard,Stddev:standard
Mean:normal,Stddev:standard
Mean:standard,Stddev:lognormal,StddevInformative:no
Mean:standard,Stddev:lognormal,StddevInformative:yes
Get all model IDs and write them to "models.txt":
> mstan -f examples/simple.m.stan list-all-models -o models.txt
> cat models.txt
Mean:normal,Stddev:lognormal,StddevInformative:no
Mean:normal,Stddev:lognormal,StddevInformative:yes
Mean:normal,Stddev:standard
Mean:standard,Stddev:lognormal,StddevInformative:no
Mean:standard,Stddev:lognormal,StddevInformative:yes
Mean:standard,Stddev:standard
Produce an image of the model graph:
> mstan -f examples/simple.m.stan model-graph
model_graph.svg
Produce an image of the module tree:
> mstan -f examples/simple.m.stan module-tree
module_tree.svg
Print out diagnostics and produce an image of the module tree:
> mstan -f examples/simple.m.stan module-tree -v
============== Parsed program: ===============
signatures: ...
...
============== Modular tree: ===============
(root)
[Mean]
(normal)
(standard)
[Stddev]
(lognormal)
[StddevInformative]
(no)
(yes)
(standard)
============== Results: ===============
module_tree.svg
This small modular Stan program can be found at example/bernoulli.m.stan
:
data {
int<lower=0> N;
array[N] int<lower=0,upper=1> y; // or int<lower=0,upper=1> y[N];
}
parameters {
real<lower=0,upper=1> theta;
}
model {
theta ~ ThetaPrior(); // uniform prior on interval 0,1
y ~ bernoulli(theta);
}
generated quantities {
vector[N] log_lik;
for (i in 1:N) {
log_lik[i] = bernoulli_lpmf(y[i] | theta);
}
}
module "informative" ThetaPrior(theta) {
// Bias theta towards zero
theta ~ beta(1, 4);
}
module "uninformative" ThetaPrior(theta) {
theta ~ beta(1, 1);
}
To run a greedy ELPD-maximizing graph search, from the project root directory run python graph_search.py examples/bernoulli.m.stan examples/bernoulli_data.json
.
The first argument to graph_search.py
is the modular stan file and the second is input data.
My translation of the birthday case study into a modular Stan program for can be found at: examples/birthday/birthday.m.stan
.
The data for the birthday problem has been pre-processed. You can replicate the pre-processing by running Rscript prepare_birthday_data.R
inside examples/birthday
. This converts the original .csv
file into a .json
file and adds all of the extra information expected by the Stan models (e.g. holidays).
There is also a simpler translation of the birthday problem at examples/birthday/birthday-trivial-translation.m.stan
To run the graph search on this trivial example, execute python graph_search.py examples/birthday/birthday-trivial-translation.m.stan examples/birthday/births_usa_1969.json
.
Here are example results:
Visiting:
Model ID: Model:model1
Score: 6718.1204643854
Push neighbor: Model:model2 5298.35671164131
Push neighbor: Model:model3 13127.6477332358
Push neighbor: Model:model4 13586.8301927714
Push neighbor: Model:model5 -167840.2708031
Push neighbor: Model:model6 13973.7607351042
Push neighbor: Model:model7 10402.3775114932
Push neighbor: Model:model8 15306.1033233817
Push neighbor: Model:model8rhs 15391.7673968433
Visiting:
Model ID: Model:model8rhs
Score: 15391.7673968433
Visiting:
Model ID: Model:model8
Score: 15306.1033233817
Winner:
Model ID: Model:model8rhs
Score: 15391.7673968433
9 scores
2 expands
The search selected the model with ID Model:model8rhs
. To get the concrete Stan program with this ID
, run mstan exec -f examples/birthday/birthday-trivial-translation.m.stan get-model -s Model:model8rhs
.
- To see the command line calls that are being made to
elpd.R
andmstan
, setDEBUG_IO
ingraph_search.py
toTrue