Lua binding for Sealark - Starlark parser
Development of Moonlark has been put on hold until Sunlark development is complete.
[The code is currently in disarray; sealark (formerly libstarlark) has changed substantially and moonlark has not kept up. That said, the code should still work once it undergoes a little reorganization. This is not a priority atm but if you really need the Lua binding please file an issue.]
STATUS Disarray, as noted above. Previously: the parser works, the Lua code works; i.e. it can parse a BUILD.bazel file and expose the AST as a Lua table. The moonlark and sunlark bindings work too. They include code to serialize the AST, but not much else (much more is planned). No windows support. Sorry about that, but I don't have a Windows machine. It's been tested on MacOS Big Sur and Linux (Debian Stretch).
NOTE The main branch does the stuff described below, but development occurs on the dev branch, which is what you should use if you want to monitor progress or contribute.
ROADMAP: [outdated; development of Sunlark has priority over Moonlark.] the immediate task is to finish the script libraries (Lua and Scheme), which will support programmatic editing of the AST. First up: given a source file name and list of dependencies, update the BUILD.bazel file. Specifically supporting OCaml projects using OBazl; this is the use case that motivated development of moonlark. Also on the to-do list: more detailed documentation. A long-term goal is to support editing capabilities matching or exceeding those of Buildozer.
[This won't work, for a while...]
WARNING the first time you run one of the tools it may take a while to build everything (e.g. re2c takes a while to compile).
Sysdeps: libsealark builds re2c, which depends on make, autogen, autoconf, and autogen.
Add the following to WORKSPACE.bazel
:
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")
git_repository(
name = "sealark",
remote = "https://github.com/obazl/sealark",
branch = "dev",
)
http_archive( ## needed to build re2c, which libsealark needs
name = "rules_foreign_cc",
sha256 = "e14a159c452a68a97a7c59fa458033cc91edb8224516295b047a95555140af5f",
strip_prefix = "rules_foreign_cc-0.4.0",
url = "https://github.com/bazelbuild/rules_foreign_cc/archive/0.4.0.tar.gz",
)
load("@rules_foreign_cc//foreign_cc:repositories.bzl", "rules_foreign_cc_dependencies")
rules_foreign_cc_dependencies(register_built_tools=False)
NOTE: The sample code below uses moonlark
; the UI for sunlark
is the same, just substitute the former for the latter in the code. It also assumes that you are using Sealark as an external repo. If you have cloned the Sealark repo and are running from its root directory, just drop @sealark
, e.g. instead of $ bazel run @sealark//moonlark:edit
, run $bazel run moonlark:edit
.
The system Lua code for moonlark is in moonlark/lua
; user code goes in .moonlark.d
. For sunlark, the system Scheme code is in sunlark/scm
, and user code goes in ./sunlark.d
.
$ mkdir tmp # for now, this is where serialized output is written
$ bazel run @sealark//moonlark:edit -- -f lib/BUILD.bazel
INFO: Analyzed target //moonlark:edit (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //moonlark:edit up-to-date:
bazel-bin/moonlark/edit
INFO: Elapsed time: 0.123s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
08:59:10 WARN bindings/lua/lbazel.c:28: WARNING: no user luadir specified; using default: .moonlark.d
08:59:10 INFO bindings/lua/libmoonlark.c:405: Lua table 'moonlark' not found; creating
@sealark//moonlark/lua/edit.lua: moonlark_handler emitting starlark to tmp/test.BUILD
$ diff tmp/test.BUILD `pwd`/test/data/strings/BUILD.test
$
You can run your own lua code instead of the predefined library.
moonlark:edit
parses the file, converts the resulting C AST to a Lua
table, loads file edit.lua
, and calls function moonlark_handler
,
passing it the Lua AST table.
If .moonlark.d/edit.lua
exists it will override the default
@sealark//moonlark/lua/edit.lua
.
You can also pass a lua file to use on the command line with -l
:
$ bazel run @sealark//moonlark:edit -- -f lib/BUILD.bazel -l foo.lua
The moonlark Lua library (at @sealark//moonlark/lua
) will still be
on the load path. It contains various Lua files you can use, e.g.
serialize.lua
and pprint.lua
. Load them with e.g. require "serialize"
.
WARNING the moonlark Lua library is still under development and will change.
$ bazel run @sealark//moonlark:repl
... build output ...
Lua 5.4.3 Copyright (C) 1994-2021 Lua.org, PUC-Rio
> for k,v in pairs(moonlark) do
>> print(k,v)
>> end
version 0.1.0
TOK table: 0x7ff50740dcb0
iTOK table: 0x7ff50740f070
pTOK table: 0x7ff50740f0d0
parse_file function: 0x1063d3ba0
config_bazel function: 0x1063d3b80
> moonlark.config_bazel()
> ast = moonlark.parse_file("test/data/strings/BUILD.test") # path to some buildfile in your repo
> pp = require "pprint"
> pp(ast)
{
build_file = 'test/data/strings/BUILD.test',
col = 0,
line = 0,
subnodes = { {
col = 0,
line = 0,
subnodes = { {
col = 0,
line = 0,
subnodes = { {
col = 0,
line = 0,
q = '"',
qq = 1,
s = 'hello',
t = 'TK_STRING',
type = 79
} },
t = 'TK_Expr_List',
type = 106
... etc ...
IMPORTANT Bazel awareness is integrated into moonlark:edit
, but
not moonlark:repl
. The latter launches a Lua interpreter with the
moonlark
package preloaded, but it does not assume that it has
been run by Bazel in a Bazel repo. To make it behave like the former
you must run moonlark.config_bazel()
. This will put the moonlark
Lua library on the load path and change directory to the original
launch directory.
$ bazel test @sealark//test/unit ## runs all test suites
$ bazel test @sealark//test/unit:statements ## runs a single test suite
The motivating use case was the need to automatically update BUILD.bazel files for OCaml/Coq projects. Dependency management is somewhat complicated in OCaml, and manual updating of dependencies in BUILD files is tedious and error prone. But all the information needed to update the BUILD files is available in the source code; all that is needed is tooling to determine the dependencies and then update the files. A dependency analysis tool is already available (Codept), but no suitable tool for programmatically editing BUILD files was available. I considered Gazelle, but decided that learning how extend Gazelle to support the OCaml use case would probably take about as much effort as just writing a Starlark parser. In addition, depending on Gazelle means depending on Go, and I prefer to avoid forcing potential users to deal with that, even if Bazel does make it mostly invisible. The path of least resistence is C, since just about every system already has a C toolchain, and just about every language can integrate a C library with reasonable effort.
[TODO: more detailed comparison with Gazelle. Gazelle is a powerful
tool, why do we need another one? Short answer: The Unix Way - small,
well-defined, single-purpose tools. Gazelle does a whole bunch of
stuff. libsealark does one thing, and the design intention is that it
should be easy to combine it with other small, well-defined,
single-purpose tools (e.g.
codept, which analyzes OCaml
dependencies) to build composite tools. Actually moonlark
is an
example: it combines libsealark with a Lua tool for analyzing the
AST, and another tool for serializing the AST to a build file. All can
be swapped out for alternative implementations.]
re2c: autogen, autoconf, libtool, make - the usual suspects.
Lua: we build liblua with readline. On linux, you must install
libreadline-dev, e.g. sudo apt-get install -y libreadline-dev
.
Unfortunately readline is GPLed. But it is only used with liblua, so...? Replacing it with a free-license alternative is on the to-do list.
Docs are a WiP; see also devguide.
Target: //src:starlark
libsealark
is a C11 library that contains routines to parse files
and strings of Starlark code, producing a simple AST. It also contains
some serialization routines to write the AST to a string. The result
can be compared to the original input. (The goal is a 100% match,
including whitespace and comments).
libsealark
uses re2c for lexing,
lemon for
parsing, and uthash for C
data structures. Experienced C programmers will notice there are no
header (.h) files in the source tree. That's because it uses
Makeheaders,
which automatically generates one header for each source file,
containing everything it needs (and nothing more). Each BUILD.bazel
file contains a :mkhdrs
target that runs makeheaders
. In addition
sealark/BUILD.bazel
has a :mkhdrs-export
that generates the
sealark.h
public API.
Currently libsealark
does not contain a public API for manipulating
the AST. A developer could easily implement such routines, however,
since the AST is pretty simple, and it uses utarray
and utstring
from the UTHash library.
Instead, the parsing routines of libsealark
are exposed in
moonlark
, a Lua module, which also exposes the parsed AST as a Lua
table. AST manipulation and serialization can then be implemented in
Lua code. The idea is that this will make customization much easier
(since Lua is much simpler than C), thus enabling tool makers to build
a variety of tools on top of moonlark/libsealark. Default
implementations are provided, but the user can easily supply
alternatives.
moonlark
packages libsealark as a lua module, allowing the parser
to be run from a lua program. I.e. it extends Lua.
Target //moonlark:repl
is the lua application augmented by moonlark.
Running $ bazel run moonlark:repl
will launch a lua repl with
moonlark preloaded.
Target //moonlark:edit
is a C application that runs the libsealark
routine starlark_parse_file
, converts the resulting AST to a Lua
table, and invokes a user-supplied Lua function named handler
.
The lua module is called 'moonlark'.
Start a lua repl, with moonlark preloaded:
$ bazel run moonlark:repl
To run lua code at startup, write it to a file in ~/.moonlark.d
and
pass it with -i
(note the double-hyphen --
.) :
$ bazel run moonlark:repl -- -i `pwd`/.moonlark.d/mytest.lua
At the repl you can parse a file and serialize the result.
moonlark.parse_file
returns the AST. You need to run
moonlark.config_bazel
if you want to use the moonlark Lua library.
> moonlark.config_bazel()
> serpent = require "serpent"
> moonlark.parse_file("path/to/BUILD.bazel")
> print(serpent.block(bazel.build))
To have libsealark parse a file using moonlark:edit
and invoke your
own lua code (a callback) on the AST:
- your lua code goes in
<projroot>/.moonlark.d/edit.lua
(use the code here as an example) - the callback routine must be a function taking one arg (an AST table) named
moonlark_handler
- run:
$ bazel run moonlark:edit -- -f path/to/BUILD.bazel
Moonlark will parse the file (using the C libsealark library) and convert the AST to a Lua table.
The following lua modules are available for working with the AST:
edit.lua
- default callback for moonlark:editserialize.lua
- for writing the AST to a file as Starlark codepprint.lua
- pretty printing (i.e. serializing to Lua code)
To use them put something like the following in your Lua code:
s = require "serialize"
pp = require "pprint"
See moonlark/lua for examples.
For some of the unit tests debug flags must be set. For example,
test/unit/sunlark:vectors
will fail unless flag
--//sealark:yydebug=vectors
is passed. The flags are defined in
//:BUILD.bzl
, //:BUILD.bazel
, and //sealark/BUILD.bazel
; see
bzl/user.bazelrc
for use example.
$ bazel test test/unit # run all tests
# test suites are targets within test/unit
$ bazel test test/unit:expressions
$ bazel test test/unit:strings
etc.
Lex a file, dumping result to stdout:
$ bazel run test:lex_file -- -f `pwd`/data/test.lex.BUILD
Parse a file:
$ bazel run test:parse_file -- -f `pwd`/data/bazel/examples/cpp/BUILD
This dumps trace/debug messages to stdout and serializes the parsed AST to a temporary file. It then compares the output to the input.
Tests in test/sys
are still under construction.
test:lua_file does not work currently.
starlark grammar: https://github.com/bazelbuild/starlark/blob/master/spec.md#grammar-reference
bazelbuild buildfile parser stuff: https://github.com/bazelbuild/buildtools/tree/master/build
lexer (Golang impl): https://github.com/bazelbuild/buildtools/blob/master/build/lex.go