Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch stanc.exe to use Cmdliner for cli #1478

Merged
merged 7 commits into from
Jan 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,8 @@ pipeline {
description: "Pass STANCFLAGS to make/local, default none")
booleanParam(name:"run_slow_perf_tests", defaultValue: false, description:"Run additional 'slow' performance tests")
string(defaultValue: '', name: 'build_multiarch_docker_tag', description: "Docker tag for the multiarch image")
booleanParam(name:"build_multiarch", defaultValue: false, description:"Build multiarch images even when not on 'master'")

}
options {
parallelsAlwaysFailFast()
Expand All @@ -113,7 +115,7 @@ pipeline {
GIT_AUTHOR_EMAIL = '[email protected]'
GIT_COMMITTER_NAME = 'Stan Jenkins'
GIT_COMMITTER_EMAIL = '[email protected]'
MULTIARCH_DOCKER_TAG = 'multiarch-ocaml-4.14-v2'
MULTIARCH_DOCKER_TAG = 'multiarch-ocaml-4.14-v2-and-cmdliner'
Comment on lines -116 to +118
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know anything about Jenkins, but what's the reason for the tag change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to re-build the multiarch images to add cmdliner. Because they don't install the developer-only dependencies like ocamlformat, I guess they didn't have it already.

This was a headache, and it's definitely non-obvious how the multiarch build works, see #1480

}
stages {
stage('Verify changes') {
Expand Down Expand Up @@ -749,7 +751,7 @@ pipeline {
beforeAgent true
allOf {
expression { !skipRebuildingBinaries }
anyOf { buildingTag(); branch 'master' }
anyOf { buildingTag(); branch 'master'; expression { params.build_multiarch } }
}
}
agent {
Expand Down Expand Up @@ -783,7 +785,7 @@ pipeline {
beforeAgent true
allOf {
expression { !skipRebuildingBinaries }
anyOf { buildingTag(); branch 'master' }
anyOf { buildingTag(); branch 'master'; expression { params.build_multiarch } }
}
}
agent {
Expand Down Expand Up @@ -814,7 +816,7 @@ pipeline {
beforeAgent true
allOf {
expression { !skipRebuildingBinaries }
anyOf { buildingTag(); branch 'master' }
anyOf { buildingTag(); branch 'master'; expression { params.build_multiarch } }
}
}
agent {
Expand Down Expand Up @@ -845,7 +847,7 @@ pipeline {
beforeAgent true
allOf {
expression { !skipRebuildingBinaries }
anyOf { buildingTag(); branch 'master' }
anyOf { buildingTag(); branch 'master'; expression { params.build_multiarch } }
}
}
agent {
Expand Down Expand Up @@ -876,7 +878,7 @@ pipeline {
beforeAgent true
allOf {
expression { !skipRebuildingBinaries }
anyOf { buildingTag(); branch 'master' }
anyOf { buildingTag(); branch 'master'; expression { params.build_multiarch } }
}
}
agent {
Expand Down Expand Up @@ -907,7 +909,7 @@ pipeline {
beforeAgent true
allOf {
expression { !skipRebuildingBinaries }
anyOf { buildingTag(); branch 'master' }
anyOf { buildingTag(); branch 'master'; expression { params.build_multiarch } }
}
}
agent {
Expand Down
48 changes: 25 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,23 @@
# A New Stan-to-C++ Compiler, stanc3

This repo contains a new compiler for Stan, stanc3, written in OCaml.
Since version 2.26, this has been the default compiler for Stan. See [this wiki](https://github.com/stan-dev/stanc3/wiki/changes-from-stanc2) for a list of minor differences between this compiler and the previous Stan compiler.
The latest release (with binaries for many platforms and a JavaScript version)
can be found on the [releases page](https://github.com/stan-dev/stanc3/releases).

To read more about why we built this, see this [introductory blog post](https://statmodeling.stat.columbia.edu/2019/03/13/stanc3-rewriting-the-stan-compiler/). For some discussion as to how we chose OCaml, see [this accidental flamewar](https://discourse.mc-stan.org/t/choosing-the-new-stan-compilers-implementation-language/6203).
We're testing [these models](https://jenkins.flatironinstitute.org/job/Stan/job/Stanc3/job/master/) (listed under Test Results) on every pull request.
Since version 2.26, this has been the default compiler for Stan.

[![Build Status](https://jenkins.flatironinstitute.org/job/Stan/job/Stanc3/job/master/badge/icon?style=flat-square)](https://jenkins.flatironinstitute.org/job/Stan/job/Stanc3/job/master/) [![codecov](https://codecov.io/gh/stan-dev/stanc3/branch/master/graph/badge.svg?token=tt76nVXoht)](https://codecov.io/gh/stan-dev/stanc3)

## Documentation

Documentation for users of stanc3 is in the Stan Users' Guide [here](https://mc-stan.org/docs/stan-users-guide/using-the-stan-compiler.html)

The Stanc3 Developer documentation is available here: https://mc-stan.org/stanc3/stanc

Developer documentation is available [here](https://mc-stan.org/stanc3/stanc).
Want to contribute? See [Getting Started](https://mc-stan.org/stanc3/stanc/getting_started.html)
for setup instructions and some useful commands.

## High-level concepts, invariants, and 30,000-ft view

Stanc3 has 4 main src packages: `frontend`, `middle`, `analysis_and_optimization` and `stan_math_backend`.
These are pieced together by the `driver` module.

Expand All @@ -35,11 +36,13 @@ flowchart

The goal is to keep as many details about the way Stan is implemented by the core C++ implementation in the Stan Math backend library as possible.
The Middle library contains the MIR and currently any types or functions used by the two ends.
The entrypoint for the compiler is in `src/stanc/stanc.ml` which sequences the various components together.
The entrypoint for the compiler is in `src/stanc/stanc.ml`. This parse command line arguments and
calls into `src/driver/Entry.ml`, which sequences the various components together.

### Distinct stanc Phases

The phases of stanc are summarized in the following information flowchart and list.

```mermaid
flowchart TB

Expand Down Expand Up @@ -90,36 +93,35 @@ flowchart TB
```

1. [Lex](src/frontend/lexer.mll) the Stan language into tokens.
1. [Parse](src/frontend/parser.mly) Stan language into AST that represents the syntax quite closely and aides in development of pretty-printers and linters. `stanc --debug-ast` to print this out.
1. Typecheck & add type information [Typechecker.ml](src/frontend/Typechecker.ml). `stanc --debug-decorated-ast`
1. [Lower](src/frontend/Ast_to_Mir.ml) into [Middle Intermediate Representation](src/middle/Program.ml) (AST -> MIR) `stanc --debug-mir` (or `--debug-mir-pretty`)
1. Backend-specific MIR transform (MIR -> MIR) [Transform_Mir.ml](src/stan_math_backend/Transform_Mir.ml) `stanc --debug-transformed-mir`
1. Analyze & optimize (MIR -> MIR)
1. Code generation (MIR -> [C++](src/stan_math_backend/Stan_math_code_gen.ml)) (or other outputs, like [Tensorflow](https://github.com/stan-dev/stan2tfp/)).
2. [Parse](src/frontend/parser.mly) Stan language into AST that represents the syntax quite closely and aides in development of pretty-printers and linters. `stanc --debug-ast` to print this out.
3. Typecheck & add type information [Typechecker.ml](src/frontend/Typechecker.ml). `stanc --debug-decorated-ast`
4. [Lower](src/frontend/Ast_to_Mir.ml) into [Middle Intermediate Representation](src/middle/Program.ml) (AST -> MIR) `stanc --debug-mir` (or `--debug-mir-pretty`)
5. Backend-specific MIR transform (MIR -> MIR) [Transform_Mir.ml](src/stan_math_backend/Transform_Mir.ml) `stanc --debug-transformed-mir`
6. Analyze & optimize (MIR -> MIR)
7. Code generation (MIR -> [C++](src/stan_math_backend/Lower_program.ml)) (or other outputs, like [Tensorflow](https://github.com/stan-dev/stan2tfp/)).

### The central data structures

1. `src/frontend/Ast.ml` defines the AST. The AST is intended to have a direct 1-1 mapping with the syntax, so there are things like parentheses being kept around.
The pretty-printer in the frontend uses the AST and attempts to keep user syntax the same while just adjusting whitespace.
1. `src/frontend/Ast.ml` defines the AST. The AST is intended to have a direct 1-1 mapping with the syntax, so there are things like parentheses being kept around. The pretty-printer in the frontend uses the AST and attempts to keep user syntax the same while just adjusting whitespace.

The AST uses a particular functional programming trick to add metadata to the AST (and its other tree types), sometimes called [the "two-level types" pattern](http://lambda-the-ultimate.org/node/4170#comment-63836). Essentially, many of the tree variant types are parameterized by something that ends up being a placeholder not for just metadata but for the recursive type including metadata, sometimes called the fixed point. So instead of recursively referencing `expression` you would instead reference type parameter `'e`, which will later be filled in with something like `type expr_with_meta = metadata expression`.

The AST intends to keep very close to Stan-level semantics and syntax in every way.

2. `src/middle/Program.ml` contains the MIR (Middle Intermediate Language). `src/frontend/Ast_to_Mir.ml` performs the lowering and attempts to strip out as much Stan-specific semantics and syntax as possible, though this is still something of a work-in-progress.
2. `src/middle/Program.ml` contains the MIR (Middle Intermediate Language). `src/frontend/Ast_to_Mir.ml` performs the lowering and attempts to strip out as much Stan-specific semantics and syntax as possible, though this is still something of a work-in-progress.

The MIR uses the same two-level types idea to add metadata, notably expression types and autodiff levels as well as locations on many things. The MIR is used as the output data type from the frontend and the input for dataflow analysis, optimization (which also outputs MIR), and code generation.


3. `src/stan_math_backend/Cpp.ml` defines a minimal representation of C++ used in code generation.
3. `src/stan_math_backend/Cpp.ml` defines a minimal representation of C++ used in code generation.

This is intentionally simpler than both the above structures and than a true C++ AST and is tailored pretty specifically
to the C++ generated in our model class.

## Design goals
* **Multiple phases** - each with human-readable intermediate representations for easy debugging and optimization design.
* **Optimizing** - takes advantage of info known at the Stan language level. Minimize information we must teach users for them to write performant code.
* **Holistic** - bring as much of the code as possible into the MIR for whole-program optimization.
* **Research platform** - enable a new class of optimizations based on probability theory.
* **Modular** - architect & build in a way that makes it easy to outsource things like symbolic differentiation to external libraries and to use parts of the compiler as the basis for other tools built around the Stan language.
* **Simplicity first** - When making a choice between correct simplicity and a perceived performance benefit, we want to make the choice for simplicity unless we can show significant (> 5%) benchmark improvements to compile times or run times. Premature optimization is the root of all evil.

- **Multiple phases** - each with human-readable intermediate representations for easy debugging and optimization design.
- **Optimizing** - takes advantage of info known at the Stan language level. Minimize information we must teach users for them to write performant code.
- **Holistic** - bring as much of the code as possible into the MIR for whole-program optimization.
- **Research platform** - enable a new class of optimizations based on probability theory.
- **Modular** - architect & build in a way that makes it easy to outsource things like symbolic differentiation to external libraries and to use parts of the compiler as the basis for other tools built around the Stan language.
- **Simplicity first** - When making a choice between correct simplicity and a perceived performance benefit, we want to make the choice for simplicity unless we can show significant (> 5%) benchmark improvements to compile times or run times. Premature optimization is the root of all evil.
1 change: 1 addition & 0 deletions docs/dependencies.mld
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ These are automatically installed through the [scripts/install_ocaml.sh] and
- {{:http://gallium.inria.fr/~fpottier/menhir/manual.html}Menhir} 20230608 (a parser generator and parsing library)
- {{:https://github.com/ocaml-ppx/ppx_deriving}ppx_deriving} 5.2.1 (a tool for generating boilerplate code)
- {{:https://erratique.ch/software/fmt}fmt} 0.9.0 (a library for pretty-printing of formatted text)
- {{:https://erratique.ch/software/cmdliner}cmdliner} 1.3.0 (a library for command line parsing)
- yojson 2.1.0 (a library for producing JSON files)


Expand Down
31 changes: 0 additions & 31 deletions docs/getting_started.mld
Original file line number Diff line number Diff line change
Expand Up @@ -38,37 +38,6 @@ for Windows development.
Once you have installed and configured WSL, you can proceed through the steps above
through the WSL shell.


{2:nix Alternative: Using Nix}

{{:https://nixos.org/nix/}Nix} is a declarative package manager with a focus on reproducible builds.
We provide the ability to use Nix to build, test and run Stanc3. We recommend trying the [opam]
instructions first if you are not an existing Nix user, with these as a backup.

If you have nix installed, you can build Stanc3 by running the following command in the [stanc3] directory:

[nix-build]

The binary will be in [result/bin/stanc]. It may take a minute the first time you run it.
Alternatively, the following is sometimes a faster way to build:

[nix-shell --command "dune build"]

To run the test suite, run:

[nix-shell --command "dune build --profile release @runtest"]

To install Stanc3 to your system, run:

[nix-env -i -f default.nix]

To drop into a sandboxed development shell with all of the build dependencies
of Stanc3 plus packages for an OCaml development environment
([dune], [ocp-indent], [ocamlformat], [merlin] and [utop]), run:

[nix-shell]


{1 Development}
{2 Useful commands}

Expand Down
2 changes: 2 additions & 0 deletions dune-project
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
(= 0.9.0))
(yojson
(= 2.1.0))
(cmdliner
(= 1.3.0))
(ocamlformat
(and
:with-test
Expand Down
11 changes: 9 additions & 2 deletions scripts/build_multiarch_stanc3.sh
Original file line number Diff line number Diff line change
@@ -1,21 +1,27 @@

# Architecture naming isn't consistent between QEMU and Docker, so lookup correct naming
if [ $1 = "mips64el" ]; then
export DOCK_PLATFORM="linux/mips64le"
export DOCK_ARCH="mips64le"
export DOCK_VARIANT=""
elif [ $1 = "arm64" ]; then
export DOCK_PLATFORM="linux/arm64"
export DOCK_ARCH="arm64"
export DOCK_VARIANT=""
elif [ $1 = "ppc64el" ]; then
export DOCK_PLATFORM="linux/ppc64le"
export DOCK_ARCH="ppc64le"
export DOCK_VARIANT=""
elif [ $1 = "armhf" ]; then
export DOCK_PLATFORM="linux/arm/v7"
export DOCK_ARCH="arm"
export DOCK_VARIANT="v7"
elif [ $1 = "armel" ]; then
export DOCK_PLATFORM="linux/arm/v6"
export DOCK_ARCH="arm"
export DOCK_VARIANT="v6"
elif [ $1 = "s390x" ]; then
export DOCK_PLATFORM="linux/s390x"
export DOCK_ARCH="s390x"
export DOCK_VARIANT=""
fi
Expand All @@ -26,6 +32,7 @@ DOCKER_IMAGE_TAG="$2"
SHA=$(skopeo inspect --raw docker://stanorg/stanc3:${DOCKER_IMAGE_TAG} | jq '.manifests | .[] | select(.platform.architecture==env.DOCK_ARCH and .platform.variant==(if env.DOCK_VARIANT == "" then null else env.DOCK_VARIANT end)).digest' | tr -d '"')

# Register QEMU translation binaries
docker run --rm --privileged multiarch/qemu-user-static --reset
docker run --privileged --rm tonistiigi/binfmt --install all

docker run --group-add=987 --group-add=980 --group-add=988 -v $(pwd):$(pwd):rw,z stanorg/stanc3:${DOCKER_IMAGE_TAG}@$SHA /bin/bash -c "cd $(pwd) && eval \$(opam env) && dune subst && dune build @install --profile static --root=."
docker run --platform=${DOCK_PLATFORM} --group-add=987 --group-add=980 --group-add=988 -v $(pwd):$(pwd):rw,z \
stanorg/stanc3:${DOCKER_IMAGE_TAG}@$SHA /bin/bash -c "cd $(pwd) && eval \$(opam env) && dune subst && dune clean && dune build @install --profile static --root=."
6 changes: 3 additions & 3 deletions scripts/docker/builder/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Pull the ubuntu:bionic base image
FROM ubuntu:bionic
# Pull the ubuntu:jammy base image
FROM ubuntu:jammy

USER root

Expand All @@ -21,4 +21,4 @@ RUN echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
RUN chown -R jenkins:sudo /usr/local

USER jenkins
WORKDIR /home/jenkins
WORKDIR /home/jenkins
12 changes: 2 additions & 10 deletions scripts/docker/debian-windows/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#Pull the ubuntu:bionic image
FROM ubuntu:bionic
#Pull the ubuntu:jammy image
FROM ubuntu:jammy

USER root

Expand Down Expand Up @@ -35,13 +35,5 @@ RUN printf "\n" | bash -x install_ocaml.sh "stanc"
COPY ./scripts/install_build_deps_windows.sh ./
RUN bash -x install_build_deps_windows.sh

#Copy our script and install dev dependencies
COPY ./scripts/install_dev_deps.sh ./
RUN bash -x install_dev_deps.sh

# Install Javascript dev environment
COPY ./scripts/install_js_deps.sh ./
RUN opam update; bash -x install_js_deps.sh

#Specify our entrypoint
ENTRYPOINT [ "opam", "config", "exec", "--" ]
11 changes: 6 additions & 5 deletions scripts/docker/debian/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Pull the ubuntu:bionic base image
FROM ubuntu:bionic
# Pull the ubuntu:jammy base image
FROM ubuntu:jammy

USER root

Expand Down Expand Up @@ -27,22 +27,23 @@ WORKDIR /home/jenkins
#Copy our script and install ocaml + init
COPY ./scripts/install_opam.sh ./
RUN printf "\n" | bash -x install_opam.sh
RUN opam update

# Install and initialize ocaml
COPY ./scripts/install_ocaml.sh ./
RUN printf "\n" | bash -x install_ocaml.sh "stanc"

# Install build dependencies
COPY ./scripts/install_build_deps.sh ./
RUN opam update; bash -x install_build_deps.sh
RUN bash -x install_build_deps.sh

# Install dev dependencies
COPY ./scripts/install_dev_deps.sh ./
RUN opam update; bash -x install_dev_deps.sh
RUN bash -x install_dev_deps.sh

# Install Javascript dev environment (js_of_ocaml 5.4.0)
COPY ./scripts/install_js_deps.sh ./
RUN opam update; bash -x install_js_deps.sh
RUN bash -x install_js_deps.sh

# Specify our entrypoint
ENTRYPOINT [ "opam", "config", "exec", "--" ]
11 changes: 5 additions & 6 deletions scripts/docker/multiarch/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Base image
FROM debian:buster-20220622-slim
FROM debian:bullseye-20241202

USER root

Expand Down Expand Up @@ -51,18 +51,17 @@ RUN eval $(opam env) && opam update

# Native-code compilation not available on MIPS, fall back to bytecode
RUN if [ "$(cat /qemu-setup/arch)" = "mips64el" ]; then \
opam switch create 4.14.1 --packages=ocaml-variants.4.14.1+options,ocaml-option-bytecode-only && opam switch 4.14.1 && opam pin num https://github.com/ocaml/num.git -y; \
opam switch create 4.14.1 --packages=ocaml-variants.4.14.1+options,ocaml-option-bytecode-only && opam switch 4.14.1; \
else \
opam switch create 4.14.1 && opam switch 4.14.1; \
fi

RUN eval $(opam env) && opam repo add internet https://opam.ocaml.org

RUN eval $(opam env) && opam install -y dune
RUN eval $(opam env) && opam repository add dune-universe git+https://github.com/dune-universe/opam-overlays.git
RUN eval $(opam env) && opam update && opam upgrade
RUN eval $(opam env) && opam install -y dune
RUN eval $(opam env) && opam install -y core.v0.16.0
RUN eval $(opam env) && opam install -y menhir.20230608
RUN eval $(opam env) && opam install -y ppx_deriving.5.2.1
RUN eval $(opam env) && opam install -y fmt.0.9.0
RUN eval $(opam env) && opam install -y yojson.2.1.0
RUN eval $(opam env)
RUN eval $(opam env) && opam install -y cmdliner.1.3.0+dune
6 changes: 3 additions & 3 deletions scripts/docker/publish/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Pull the ubuntu:bionic base image
FROM ubuntu:bionic
# Pull the ubuntu:jammy base image
FROM ubuntu:jammy

USER root

Expand All @@ -25,4 +25,4 @@ RUN echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
RUN chown -R jenkins:sudo /usr/local

USER jenkins
WORKDIR /home/jenkins
WORKDIR /home/jenkins
Loading