-
-
Notifications
You must be signed in to change notification settings - Fork 5
Home
The key idea is to split parsing into two stages. They're analogous to the lexer + parser pair in a compiler. Dividing the parsing into two pieces allows each to be simpler.
The first stage (this repo) crawls and converts original sources to JSON. The actual schema of the JSON mirrors the original content as much as possible. And so, each type of original source will have very different looking JSON. But, being JSON (instead of PDF, HTML, etc.) they're all easily read by the next stage. The second stage can focus on converting the source schema to a particular app's needs.
The second stage transforms the JSON to an app's internal representation. That code is outside the scope of this repo because many apps by different developers can use the source data. E.g., Public.Law imports the JSON data into a Postgres database and Netlify static pages. That particular code isn't yet open source.
Current project: International Law in support of Ukraine