Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better (our own?) YAML parser #229

Open
GreyCat opened this issue Aug 25, 2017 · 5 comments
Open

Better (our own?) YAML parser #229

GreyCat opened this issue Aug 25, 2017 · 5 comments

Comments

@GreyCat
Copy link
Member

GreyCat commented Aug 25, 2017

This is the question that was discussed a zillion of times, but I'd want to create a separate issue for this one.

Current state of things

Right now we're using external YAML parsers to parse .ksy files, that is:

  • for JVM build of ksc — SnakeYAML, written in Java
  • for JS build of ksc — yamljs / jsyaml / ..., written in JavaScript; effectively they just convert YAML into JS objects/arrays structure

What we're generally ok to drop

Some YAML compatibility, i.e. we're generally ok to implement a smaller YAML subset. For example, we don't need:

  • multiple documents in a file
  • directives
  • tags
  • node anchors and references
  • block chomping controls
  • explicit typing with !!
  • timestamps
  • actually anything beyond failsafe schema

However, we shouldn't add and modify YAML semantics, i.e. all our .ksy files should still stay valid YAML documents, available for parsing in other YAML parsers.

What we'd want to have

A parser that:

  • is written in Scala with no major extra dependencies — this way it would compile smoothly for all 3 major Scala targets: JVM, JS and native binaries
  • allows semi-complete lines that appear while one's in progress of typing a line (useful for IDE); it should report an error here, but try to resume parsing, probably by throwing erroneous line and trying to resume from the next line
  • reports exact positions of the problems — i.e. line/column, not only YAML path
  • reports many (ideally, all) problems from the run, not only the very first one
  • allow distinction (and proper error reporting) between stuff like:
    • having null specified vs having no value at all
  • allows to check our style guide stuff and issue warnings on them (and maybe offer autocorrection?)

Anything else?

Current efforts

@koczkatamas
Copy link
Member

block chomping controls

If this is about string parsing then the formats repo currently use the |- operator. I don't know if it's a mistake or a required feature.

Anything else?

I forgot to mention but I would need a mode where the parser also stores every node's original position (file byte offset, row and column number in an optimal situation), so if I want to select the /types/something/seq/2/encoding node then I could set the cursor position to this node in the text editor.

I can also revert this position-node map and if ex. the user is currently at the 153 byte offset, I'll know he is at the /types/something/seq/2/encoding node and I can show him the auto complete at that position or jump to the parent node if needed, etc.

Or if we ever create source maps then those will store that "this JS code was generated by the /types/something/seq/2 field" and I can - again - select that node if needed.

Currently my AST parser stores every node's exact start and end position (separately even for a map's key, etc). Maybe it's enough to store only the start position, but I wanted to be future-proof.

@GreyCat
Copy link
Member Author

GreyCat commented Aug 25, 2017

If this is about string parsing then the formats repo currently use the |- operator. I don't know if it's a mistake or a required feature.

If that's for doc lines, then probably it's not really needed.

I forgot to mention but I would need a mode where the parser also stores every node's original position (file byte offset, row and column number in an optimal situation)

Makes sense, thanks!

@GreyCat
Copy link
Member Author

GreyCat commented Aug 5, 2018

Found YAML test suite and its results matrix.

@GreyCat
Copy link
Member Author

GreyCat commented Jan 6, 2022

Updates from early 2022. A few interesting projects sprang up recently, namely:

  • yamlesque, a pure Scala YAML parser, compatible with Scala 2.12 and Scala 3, both JVM, ScalaJS and Native.
  • VirtusLab's scala-yaml, another pure Scala YAML parser, aiming to be compatible with ScalaJS and Native.

@GreyCat
Copy link
Member Author

GreyCat commented Jan 16, 2022

I've done a proof-of-concept port to yamlesque, available in yamlesque branch.

  • Pros: it generally seems to work, although there are lots of broken tests
  • Cons: yamlesque seems to not support flow-style YAML, thus many contents specifications will appear to be broken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants