Skip to content

Commit

Permalink
Revise README and add subdir READMEs (#1023)
Browse files Browse the repository at this point in the history
  • Loading branch information
geoffromer authored Jan 28, 2022
1 parent dfef55f commit 6c3e1f6
Show file tree
Hide file tree
Showing 4 changed files with 264 additions and 176 deletions.
281 changes: 105 additions & 176 deletions executable_semantics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,134 +6,118 @@ Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

This directory contains a work-in-progress executable semantics. It started as
an executable semantics for Featherweight C and it is migrating into an
executable semantics for the Carbon language. It includes a parser, type
checker, and abstract machine.

This language currently includes several kinds of values: integer, booleans,
functions, and structs. A kind of safe union, called a `choice`, is in progress.
Regarding control-flow, it includes if statements, while loops, break, continue,
function calls, and a variant of `switch` called `match` is in progress.

The grammar of the language matches the one in Proposal
[#162](https://github.com/carbon-language/carbon-lang/pull/162). The type
checker and abstract machine do not yet have a corresponding proposal.
Nevertheless they are present here to help test the parser but should not be
considered definitive.

The parser is implemented using the flex and bison parser generator tools.

- [`lexer.lpp`](syntax/lexer.lpp) the lexer specification
- [`parser.ypp`](syntax/parser.ypp) the grammar

The parser translates program text into an abstract syntax tree (AST), defined
in the [ast](ast/) subdirectory. The `UnimplementedExpression` node type can be
used to define new expression syntaxes without defining their semantics, and the
same techniques can be applied to other kinds of AST nodes as needed. See the
handling of the `UNIMPL_EXAMPLE` token for an example of how this is done, and
see [`unimplemented_example_test.cpp`](syntax/unimplemented_example_test.cpp)
for an example of how to test it.

The [type checker](interpreter/type_checker.h) defines what it means for an AST
to be a valid program. The type checker prints an error and exits if the AST is
invalid.

The parser and type checker together specify the static (compile-time)
semantics.

The dynamic (run-time) semantics is specified by an abstract machine. Abstract
machines have several positive characteristics that make them good for
specification:

- abstract machines operate on the AST of the program (and not some
lower-level representation such as bytecode) so they directly connect the
program to its behavior

- abstract machines can easily handle language features with complex
control-flow, such as goto, exceptions, coroutines, and even first-class
continuations.

The one down-side of abstract machines is that they are not as simple as a
definitional interpreter (a recursive function that interprets the program), but
it is more difficult to handle complex control flow in a definitional
interpreter.

[InterpProgram()](interpreter/interpreter.h) runs an abstract machine using the
[interpreter](interpreter/), as described below.

## Abstract Machine

The abstract machine implements a state-transition system. The state is defined
by the `State` structure, which includes three components: the procedure call
stack, the heap, and the function definitions. The `Step` function updates the
state by executing a little bit of the program. The `Step` function is called
repeatedly to execute the entire program.

An implementation of the language (such as a compiler) must be observationally
equivalent to this abstract machine. The notion of observation is different for
each language, and can include things like input and output. This language is
currently so simple that the only thing that is observable is the final result,
an integer. So an implementation must produce the same final result as the one
produces by the abstract machine. In particular, an implementation does **not**
have to mimic each step of the abstract machine and does not have to use the
same kinds of data structures to store the state of the program.

A procedure call frame, defined by the `Frame` structure, includes a pointer to
the function being called, the environment that maps variables to their
addresses, and a to-do list of actions. Each action corresponds to an expression
or statement in the program. The `Action` structure represents an action. An
action often spawns other actions that needs to be completed first and
afterwards uses their results to complete its action. To keep track of this
process, each action includes a position field `pos` that stores an integer that
starts at `-1` and increments as the action makes progress. For example, suppose
the action associated with an addition expression `e1 + e2` is at the top of the
to-do list:

(e1 + e2) [-1] :: ...

When this action kicks off (in the `StepExp` function), it increments `pos` to
`0` and pushes `e1` onto the to-do list, so the top of the todo list now looks
like:

e1 [-1] :: (e1 + e2) [0] :: ...

Skipping over the processing of `e1`, it eventually turns into an integer value
`n1`:

n1 :: (e1 + e2) [0]

Because there is a value at the top of the to-do list, the `Step` function
invokes `HandleValue` which then dispatches on the next action on the to-do
list, in this case the addition. The addition action spawns an action for
subexpression `e2`, increments `pos` to `1`, and remembers `n1`.

e2 [-1] :: (e1 + e2) [1](n1) :: ...
`executable_semantics` is an implementation of Carbon whose primary purpose is
to act as a clear specification of the language. As an extension of that goal,
it can also be used as a platform for prototyping and validating changes to the
language. Consequently, it prioritizes straightforward, readable code over
performance, diagnostic quality, and other conventional implementation
priorities. In other words, its intended audience is people working on the
design of Carbon, and it is not intended for real-world Carbon programming on
any scale. See the [`toolchain`](/toolchain/) directory for a separate
implementation that's focused on the needs of Carbon users.

## Overview

`executable_semantics` represents Carbon code using an abstract syntax tree
(AST), which is defined in the [`ast`](ast/) directory. The [`syntax`](syntax/)
directory contains lexer and parser, which define how the AST is generated from
Carbon code. The [`interpreter`](interpreter/) directory contains the remainder
of the implementation.

`executable_semantics` is an interpreter rather than a compiler, although it
attempts to separate compile time from run time, since that separation is an
important constraint on Carbon's design.

## Programming conventions

The class hierarchies in `executable_semantics` are built to support
[LLVM-style RTTI](https://llvm.org/docs/HowToSetUpLLVMStyleRTTI.html), and
define a `kind` accessor that returns an enum identifying the concrete type.
`executable_semantics` typically relies less on virtual dispatch, and more on
using `kind` as the key of a `switch` and then down-casting in the individual
cases. As a result, adding a new derived class to a hierarchy requires updating
existing code to handle it. It is generally better to avoid defining `default`
cases for RTTI switches, so that the compiler can help ensure the code is
updated when a new type is added.

`executable_semantics` never uses plain pointer types directly. Instead, we use
the [`Nonnull<T*>`](common/nonnull.h) alias for pointers that are not nullable,
or `std::optional<Nonnull<T*>>` for pointers that are nullable.

Many of the most commonly-used objects in `executable_semantics` have lifetimes
that are tied to the lifespan of the entire Carbon program. We manage the
lifetimes of those objects by allocating them through an
[`Arena`](common/arena.h) object, which can allocate objects of arbitrary types,
and retains ownership of them. As of this writing, all of `executable_semantics`
uses a single `Arena` object, we may introduce multiple `Arena`s for different
lifetime groups in the future.

For simplicity, `executable_semantics` generally treats all errors as fatal.
Errors caused by bugs in the user-provided Carbon code should be reported with
the macros in [`error.h`](common/error.h). Errors caused by bugs in
`executable_semantics` itself should be reported with
[`CHECK` or `FATAL`](../common/check.h).

Skipping over the processing of `e2`, it eventually turns into an integer value
`n2`:
## Example Programs (Regression Tests)

The [`testdata/`](testdata/) subdirectory includes some example programs with
expected output.

These tests make use of LLVM's
[lit](https://llvm.org/docs/CommandGuide/lit.html) and
[FileCheck](https://llvm.org/docs/CommandGuide/FileCheck.html). Tests have
boilerplate at the top:

```carbon
// Part of the Carbon Language project, under the Apache License v2.0 with LLVM
// Exceptions. See /LICENSE for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
// RUN: executable_semantics %s 2>&1 | \
// RUN: FileCheck --match-full-lines --allow-unused-prefixes=false %s
// RUN: executable_semantics --trace %s 2>&1 | \
// RUN: FileCheck --match-full-lines --allow-unused-prefixes %s
// AUTOUPDATE: executable_semantics %s
// CHECK: result: 0
n2 :: (e1 + e2) [1](n1) :: ...
package ExecutableSemanticsTest api;
```

Again the `Step` function invokes `HandleValue` and dispatches to the addition
action which performs the arithmetic and pushes the result on the to-do list.
Let `n3` be the sum of `n1` and `n2`.
To explain this boilerplate:

n3 :: ...
- The standard copyright is expected.
- The `RUN` lines indicate two commands for `lit` to execute using the file:
one without `--trace` output, one with.
- Output is piped to `FileCheck` for verification.
- Setting `-allow-unused-prefixes` to false when processing the ordinary
output, and true when handling the `--trace` output, allows us to omit
the tracing output from the `CHECK` lines, while ensuring they cover all
non-tracing output.
- Setting `-match-full-lines` in both cases indicates that each `CHECK`
line must match a complete output line, with no extra characters before
or after the `CHECK` pattern.
- `RUN:` will be followed by the `not` command when failure is expected.
In particular, `RUN: not executable_semantics ...`.
- `%s` is a
[`lit` substitution](https://llvm.org/docs/CommandGuide/lit.html#substitutions)
for the path to the given test file.
- The `AUTOUPDATE` line indicates that `CHECK` lines will be automatically
inserted immediately below by the `./update_checks.py` script.
- The `CHECK` lines indicate expected output, verified by `FileCheck`.
- Where a `CHECK` line contains text like `{{.*}}`, the double curly
braces indicate a contained regular expression.
- The `package` is required in all test files, per normal Carbon syntax rules.

The heap is an array of values. It is used to store anything that is mutable,
including function parameters and local variables. An address is simply an index
into the array. The assignment operation stores the value of the right-hand side
into the heap at the index specified by the address of the left-hand side
lvalue.
### Useful commands

Function calls push a new frame on the stack and the `return` statement pops a
frame off the stack. The parameter passing semantics is call-by-value, so the
machine applies `CopyVal` to the incoming arguments and the outgoing return
value. Also, the machine kills the values stored in the parameters and local
variables when the function call is complete.
- `./update_checks.py` -- Updates expected output.
- `bazel test :executable_semantics_lit_test --test_output=errors` -- Runs
tests and prints any errors.
- `bazel test :executable_semantics_lit_test --test_output=errors --test_arg=--filter=basic_syntax/.*`
-- Only runs tests in the `basic_syntax` directory; `--filter` is a regular
expression.

## Experimental: Delimited Continuations
## Experimental feature: Delimited Continuations

Delimited continuations provide a kind of resumable exception with first-class
continuations. The point of experimenting with this feature is not to say that
Expand Down Expand Up @@ -214,58 +198,3 @@ We adapted the feature to operate in a more imperative manner. The
`shift` to pause and capture the continuation object. The `__run` feature is
equivalent to calling the continuation. The `__await` feature is equivalent to a
`shift` except that it updates the continuation in place.

## Example Programs (Regression Tests)

The [`testdata/`](testdata/) subdirectory includes some example programs with
expected output.

These tests make use of LLVM's
[lit](https://llvm.org/docs/CommandGuide/lit.html) and
[FileCheck](https://llvm.org/docs/CommandGuide/FileCheck.html). Tests have
boilerplate at the top:

```carbon
// Part of the Carbon Language project, under the Apache License v2.0 with LLVM
// Exceptions. See /LICENSE for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
// RUN: executable_semantics %s 2>&1 | \
// RUN: FileCheck --match-full-lines --allow-unused-prefixes=false %s
// RUN: executable_semantics --trace %s 2>&1 | \
// RUN: FileCheck --match-full-lines --allow-unused-prefixes %s
// AUTOUPDATE: executable_semantics %s
// CHECK: result: 0
package ExecutableSemanticsTest api;
```

To explain this boilerplate:

- The standard copyright is expected.
- The `RUN` lines indicate two commands for `lit` to execute using the file:
one without `--trace` output, one with.
- Output is piped to `FileCheck` for verification.
- `-allow-unused-prefixes` controls that output of the command without
`--trace` should _precisely_ match `CHECK` lines, whereas the command
with `--trace` will be a superset.
- `RUN:` will be followed by the `not` command when failure is expected.
In particular, `RUN: not executable_semantics ...`.
- `%s` is a
[`lit` substitution](https://llvm.org/docs/CommandGuide/lit.html#substitutions)
for the path to the given test file.
- The `AUTOUPDATE` line indicates that `CHECK` lines will be automatically
inserted immediately below by the `./update_checks.py` script.
- The `CHECK` lines indicate expected output, verified by `FileCheck`.
- Where a `CHECK` line contains text like `{{.*}}`, the double curly
braces indicate a contained regular expression.
- The `package` is required in all test files, per normal Carbon syntax rules.

Useful commands are:

- `./update_checks.py` -- Updates expected output.
- `bazel test :executable_semantics_lit_test --test_output=errors` -- Runs
tests and prints any errors.
- `bazel test :executable_semantics_lit_test --test_output=errors --test_arg=--filter=basic_syntax/.*`
-- Only runs tests in the `basic_syntax` directory; `--filter` is a regular
expression.
34 changes: 34 additions & 0 deletions executable_semantics/ast/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

The code in this directory defines the AST that represents Carbon code in the
rest of `executable-semantics`.

All node types in the AST are derived from [`AstNode`](ast_node.h), and use
[LLVM-style RTTI](https://llvm.org/docs/HowToSetUpLLVMStyleRTTI.html) to support
safe down-casting and similar operations. Each abstract class `Foo` in the
hierarchy has a `kind` method which returns a enum `FooKind` that identifies the
concrete type of the object, and a `FooKind` value can be safely `static_cast`ed
to `BarKind` if that value represents a type that's derived from both `Foo` and
`Bar`.

We rely on code generation to help enforce those invariants, so every node type
must be described in [`ast_rtti.txt`](ast_rtti.txt). See the documentation in
(`gen_rtti.py`)[../gen_rtti.py], the code generation script, for details about
the file format and generated code.

The AST class hierarchy is structured in a fairly unsurprising way, with
abstract classes such as `Statement` and `Expression`, and concrete classes
representing individual syntactic constructs, such as `If` for if-statements.

Sometimes it is useful to work with a subset of node types that "cuts across"
the primary class hierarchy. Rather than deal with the pitfalls of multiple
inheritance, we handle these cases using a form of type erasure: we specify a
notional interface that those types conform to, and then define a "view" class
that behaves like a pointer to an instance of that interface. Types declare that
they model an interface `Foo` by defining a public static member named
`ImplementsCarbonFoo`. See [NamedEntityView](static_scope.h) for an example of
this pattern.
Loading

0 comments on commit 6c3e1f6

Please sign in to comment.