A tool to verify the correctness of rnix-parser.
This project aims to...
- Verify that rnix-parser parses grammatically-valid Nix expressions the same as the reference implementation of Nix.
- Provide more confidence that changes to rnix-parser are correct.
- Find edge cases that the parser must handle.
It is not a goal of this project to...
- Verify that rnix-parser correctly rejects grammatically-invalid Nix expressions.
For now, the workflow requires first cloning this repository. The main way to use the tool is through the CLI.
$ cargo run
cli
Utility program to test/use various aspects of rnix-parser-tester
USAGE:
cli <SUBCOMMAND>
OPTIONS:
-h, --help Print help information
SUBCOMMANDS:
compare Report differences in serialization between the reference Nix parser and
rnix-parser
dump Dump information about the given Nix expression
help Print this message or the help of the given subcommand(s)
summary Perform analysis of summaries generated by the compare subcommand
(See the cli
crate readme for full documentation)
For example, consider the following Nix expression in a file test.nix
:
let x = 3; in x + 2
We can use the compare
subcommand to check that rnix-parser and the reference implementation parse this expression to the same AST:
$ cargo run -- compare test.nix
test.nix ... equal
In this case, the parses were equal.
We can use the dump
subcommand to show the ASTs which are being compared as JSON:
$ cargo run -- dump test.nix -p rnix -p reference
test.nix ...
==== Reference impl json ====
{"Let":{"attrs":{"Attrs":{"attrs":[{"expr":{"Int":3},"inherited":false,"name":"x"}],"dynamic_attrs":[],"rec":false}},"body":{"OpConcatStrings":{"es":[{"Var":"x"},{"Int":2}],"force_string":false}}}}
==== rnix-parser json ====
{"Let":{"attrs":{"Attrs":{"rec":false,"attrs":[{"name":"x","inherited":false,"expr":{"Int":3}}],"dynamic_attrs":[]}},"body":{"OpConcatStrings":{"force_string":false,"es":[{"Var":"x"},{"Int":2}]}}}}
This is especially useful for when compare
reports that some expression isn't parsed the same by both parsers. In that case, we can use dump
and our favorite JSON differ (e.g., http://www.jsondiff.com/) to debug further.
compare
also...
- Accepts directories
- Can recurse into those directories
- Accepts NIX_PATH paths between angle brackets
- Can save a summary of the comparisons
All together, this gives us the following commmand which will check that rnix-parser parses every Nix file in nixpkgs the same as the reference parser, saving a summary for later use:
$ cargo run -- compare '<nixpkgs>' -r --save-summary summary/done.json
/nix/var/nix/profiles/per-user/root/channels/nixos/nixpkgs/flake.nix ... equal
/nix/var/nix/profiles/per-user/root/channels/nixos/nixpkgs/nixos/release-small.nix ... equal
... and so on ...
The compare
and summary
subcommands can be used to determine how a change to rnix-parser affects its ability to parse.
cargo run -- compare '<nixpkgs>' -r --save-summary summary/before.json
- Make some change to rnix-parser. For local development, this probably also involves changing the
rnix
dependency inCargo.toml
. For example, we could change it tornix = { path = "../rnix-parser" }
to use a local version. cargo run -- compare '<nixpkgs>' -r --save-summary summary/after.json
We can then compare the summaries:
$ cargo run -- summary summary/fix-nested-apply.json summary/change-path-handling.json
== Progressions (not equal before, equal after) ==
/nix/var/nix/profiles/per-user/root/channels/nixos/nixpkgs/pkgs/tools/audio/botamusique/node-packages.nix
/nix/var/nix/profiles/per-user/root/channels/nixos/nixpkgs/pkgs/servers/mjolnir/node-deps.nix
... and so on ...
== Regressions (equal before, not equal after) ==
== Summary ==
# equal before: 24050
# not equal before: 11
# reference impl errors before: 1
# rnix-parser errors before: 0
# equal after: 24059
# not equal after: 2
# reference impl errors after: 1
# rnix-parser errors after: 0
# progressions: 9
# regressions: 0
The stats at the bottom tell us that whatever change we just made resulted in 9 more files being parsed correctly, and, importantly, 0 files were parsed correctly before the change but incorrectly after the change.
Under the hood, we are "just" parsing the same expression using both rnix-parser and the reference parser, serializing the resulting ASTs to JSON, and then comparing those JSON objects.
There are several challenges to overcome in doing this:
- The reference implementation is written in C++. Rust has limited interop with C++, so a basic C-compatible wrapper was written to do the bulk of the reference impl-related work on the C++ side.
- The reference implementation closely couples parsing and evaluation. For example, the reference implementation builds a symbol table and checks for undefined variables during parsing. This limits the kinds of Nix expressions we can use and complicates the reference impl-related work.
- rnix-parser's typed AST leaves a lot to be desired from a usability standpoint. We created our own which is easier to work with for our use case.
- The reference implementation produces an AST that is sometimes structurally very different from rnix-parser. This necessitates some conversion between them, which we call "normalization".
Normalization is the process of taking the AST produced by rnix-parser and converting it to the structure used by the reference impl. For example, consider the following expression:
f 0 1 2
rnix-parser parses this expression as (((f 0) 1) 2)
, i.e., three nested Apply
nodes, each with one argument. However, the reference impl collects all of the function arguments into a single Apply
node with three arguments. Both parses are "correct", but we need to transform the former into the latter if we want to directly compare the trees.
The rnix-normalize
crate explains in detail all of the normalization rules.
Note that normalization does not go the other way (i.e., reference impl AST -> rnix-parser AST, even though that would be easier in some cases), because...
- Working with the reference impl is generally harder, both because of C++/FFI concerns, and because our use case is not really supported
- We generally try to keep the reference impl AST 100% untouched, because we define "correct" to mean "whatever the reference impl says".
The main challenge with this project is also its biggest caveat: the normalization phase. For some expressions, we have to do a lot of work to get the rnix-parser AST to look like the reference impl AST. In practice, that means that if the tool reports that some expression is not parsed equally, it means that either rnix-parser didn't parse it correctly, or we did not normalize the AST correctly. In other words, normalization opens us up to false negatives.
However, we can have decent confidence that normalization does not easily result in false positives (i.e., rnix-parser parses something incorrectly, but then we accidentally normalize it to be equal to what the reference impl gives us), because normalization does not inject new information into the parse. It only restructures what is already there.
The primary data set this tool is targeting is nixpkgs. While this set is very large (probably the largest set of valid Nix expressions), that doesn't necessarily mean that it contains every possible edge case in the Nix grammar. In fact, it definitely doesn't. For example, at the time of writing, nixpkgs contains zero instances of a float with no decimal part (e.g., 1.
). This is a perfectly valid Nix expression, but by testing only on nixpkgs, we would not be able to tell that rnix-parser does not currently support it.
This isn't an inherent limitation of the tool since it isn't directly tied to nixpkgs. It can be used on any set of Nix expressions. But the main takeaway of this caveat is that just because we can parse nixpkgs correctly does not mean that we can parse all of Nix correctly (although perhaps it can be said that we can parse all Nix expressions that really matter. Who is writing 1.
anyways?).
We generally try to keep the reference impl AST 100% untouched, because we define "correct" to mean "whatever the reference impl says"
While this is true, we unfortunately have to break from this rule in two places:
-
The biggest break is an actual patch to Nix. Because of an implementation detail of the reference lexer (in particular, the fact that all of these rules produce
IND_STR
tokens), a multiline string with no interpolations might still be split into multipleStringToken
s. For example, consider the following expression:''foo \$''
The contents of the string would be tokenized as two separate tokens,
foo
and\$
, eventually resulting in a string concat expression with those two arguments. This is highly problematic, because rnix-parser, using a different tokenizing algorithm, will simply produce a single token for the entire string contents. After this point, it isn't clear how to normalize the resulting tree into the reference impl's without re-parsing the string contents.Consequently, we patch the reference impl to instead combine those consecutive tokens into a single one. The patch doesn't actually fix any issue in Nix, and doesn't improve performance, so it seems unlikely that it would be accepted just for our benefit.
-
By default, the reference impl sorts attribute sets in a way that would be difficult to reproduce in normalization. In particular, the reference impl maintains a global symbol table, creating symbols as they are encountered in parsing. Entries in attribute sets are then sorted by their key's position in this table. Additionally, many symbols are inserted by default without being encountered in the parsed expression.
While it would technically be possible to replicate this behavior in normalization, it would be for no benefit, because attrset order has no semantic bearing in Nix. We opt to instead sort entries lexicographically by key.
(We could alternatively solve this issue by treating the attrset entry list as a set, i.e., ignoring order, but the JSON comparison library we're using doesn't easily allow for this.)
Several patches to rnix-parser were developed with the aid of this tool: #96, #99, #101, #102, and #103. In aggregate, these patches (as well as #106), greatly improved rnix-parser's ability to correctly parse nixpkgs:
$ cargo run -- summary summary/f759fce.json summary/with-patches.json
...
== Summary ==
# equal before: 15924
# not equal before: 8135
# reference impl errors before: 1
# rnix-parser errors before: 1
# equal after: 24060
# not equal after: 0
# reference impl errors after: 1
# rnix-parser errors after: 0
# progressions: 8135
# regressions: 0
There are several areas where this tool could be improved.
- Speed. At time of writing, on my system, it takes about 6 minutes to compare all of nixpkgs. Having done no profiling, I would hypothesize that a lot of performance is being left on the table in the strategy of serializing ASTs to JSON. This was primarily done as a way to not have to worry too much about FFI, but this could be revisited now.
- Parallelizing nixpkgs comparison. Related to the previous point, comparison of all of nixpkgs could be parallelized for a huge performance win. However, the reference impl does not seem to be thread safe. I gave up at the first sign of trouble here, when the Boehm GC was yelling about something when used from multiple threads. My first thought was to just disable the GC, but this isn't currently possible. Even once it is, I don't know if there will be further issues with using the reference impl from multiple threads.
- Verifying that gramatically-invalid expressions are rejected. This would be pretty ambitious, but I think it could be possible. The two primary roadblocks would be the need of a decently-sized corpus of invalid expressions to test against (we use nixpkgs as a corpus of valid expressions), and further working around the reference impl's tight coupling of parsing and evaluation.