Better (our own?) YAML parser #229

GreyCat · 2017-08-25T15:04:16Z

This is the question that was discussed a zillion of times, but I'd want to create a separate issue for this one.

Current state of things

Right now we're using external YAML parsers to parse .ksy files, that is:

for JVM build of ksc — SnakeYAML, written in Java
for JS build of ksc — yamljs / jsyaml / ..., written in JavaScript; effectively they just convert YAML into JS objects/arrays structure

What we're generally ok to drop

Some YAML compatibility, i.e. we're generally ok to implement a smaller YAML subset. For example, we don't need:

multiple documents in a file
directives
tags
node anchors and references
block chomping controls
explicit typing with !!
timestamps
actually anything beyond failsafe schema

However, we shouldn't add and modify YAML semantics, i.e. all our .ksy files should still stay valid YAML documents, available for parsing in other YAML parsers.

What we'd want to have

A parser that:

is written in Scala with no major extra dependencies — this way it would compile smoothly for all 3 major Scala targets: JVM, JS and native binaries
allows semi-complete lines that appear while one's in progress of typing a line (useful for IDE); it should report an error here, but try to resume parsing, probably by throwing erroneous line and trying to resume from the next line
reports exact positions of the problems — i.e. line/column, not only YAML path
reports many (ideally, all) problems from the run, not only the very first one
allow distinction (and proper error reporting) between stuff like:
- having null specified vs having no value at all
allows to check our style guide stuff and issue warnings on them (and maybe offer autocorrection?)

Anything else?

Current efforts

@GreyCat tried to port SnakeYAML into Scala, but that project kind of stalled.
@koczkatamas wrote proof-of-concept AST parser of YAML subset in TypeScript, which is used as addendum for code completion purposes only

The text was updated successfully, but these errors were encountered:

koczkatamas · 2017-08-25T15:35:40Z

block chomping controls

If this is about string parsing then the formats repo currently use the |- operator. I don't know if it's a mistake or a required feature.

Anything else?

I forgot to mention but I would need a mode where the parser also stores every node's original position (file byte offset, row and column number in an optimal situation), so if I want to select the /types/something/seq/2/encoding node then I could set the cursor position to this node in the text editor.

I can also revert this position-node map and if ex. the user is currently at the 153 byte offset, I'll know he is at the /types/something/seq/2/encoding node and I can show him the auto complete at that position or jump to the parent node if needed, etc.

Or if we ever create source maps then those will store that "this JS code was generated by the /types/something/seq/2 field" and I can - again - select that node if needed.

Currently my AST parser stores every node's exact start and end position (separately even for a map's key, etc). Maybe it's enough to store only the start position, but I wanted to be future-proof.

GreyCat · 2017-08-25T15:38:00Z

If this is about string parsing then the formats repo currently use the |- operator. I don't know if it's a mistake or a required feature.

If that's for doc lines, then probably it's not really needed.

I forgot to mention but I would need a mode where the parser also stores every node's original position (file byte offset, row and column number in an optimal situation)

Makes sense, thanks!

GreyCat · 2018-08-05T13:37:04Z

Found YAML test suite and its results matrix.

GreyCat · 2022-01-06T00:03:37Z

Updates from early 2022. A few interesting projects sprang up recently, namely:

yamlesque, a pure Scala YAML parser, compatible with Scala 2.12 and Scala 3, both JVM, ScalaJS and Native.
VirtusLab's scala-yaml, another pure Scala YAML parser, aiming to be compatible with ScalaJS and Native.

GreyCat · 2022-01-16T14:20:42Z

I've done a proof-of-concept port to yamlesque, available in yamlesque branch.

Pros: it generally seems to work, although there are lots of broken tests
Cons: yamlesque seems to not support flow-style YAML, thus many contents specifications will appear to be broken

GreyCat added the enhancement label Aug 25, 2017

GreyCat mentioned this issue Aug 25, 2017

Error message should show .ksy line number #1147

Closed

This was referenced Aug 30, 2017

create if/elif/else syntax for attributes #237

Open

KSC should return not only path to errored properties, but also their location in input stream (line, column) and ids #240

Open

GreyCat mentioned this issue Jan 16, 2018

Offering assisstance #308

Closed

koczkatamas mentioned this issue Feb 13, 2018

Invalid line number while parsing YAML if comments are used kaitai-io/kaitai_struct_webide#62

Closed

KOLANICH mentioned this issue Jul 6, 2018

Inconsistent hex literals in pos: #456

Closed

GreyCat mentioned this issue Jan 25, 2019

Scala Native support #519

Open

GreyCat mentioned this issue Oct 25, 2019

Multiple type declarations behaviour #641

Closed

generalmimon mentioned this issue Jan 10, 2020

Error messages don't have line number #669

Closed

GreyCat mentioned this issue Jan 25, 2020

KSY (YAML) parser support for JSON #693

Closed

generalmimon mentioned this issue Feb 19, 2022

bootstrapping? kaitai for kaitai? #952

Open

milahu mentioned this issue Apr 13, 2023

cryptic error message when confusing value and type #1024

Open

generalmimon mentioned this issue Feb 4, 2024

Replace the yaml.js library with another YAML parser kaitai-io/kaitai_struct_webide#165

Closed

generalmimon mentioned this issue Sep 21, 2024

Unconsistent requirements for enum keys when they represented as numbers and as strings #1132

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better (our own?) YAML parser #229

Better (our own?) YAML parser #229

GreyCat commented Aug 25, 2017 •

edited by generalmimon

Loading

koczkatamas commented Aug 25, 2017

GreyCat commented Aug 25, 2017

GreyCat commented Aug 5, 2018

GreyCat commented Jan 6, 2022

GreyCat commented Jan 16, 2022

Better (our own?) YAML parser #229

Better (our own?) YAML parser #229

Comments

GreyCat commented Aug 25, 2017 • edited by generalmimon Loading

Current state of things

What we're generally ok to drop

What we'd want to have

Current efforts

koczkatamas commented Aug 25, 2017

GreyCat commented Aug 25, 2017

GreyCat commented Aug 5, 2018

GreyCat commented Jan 6, 2022

GreyCat commented Jan 16, 2022

GreyCat commented Aug 25, 2017 •

edited by generalmimon

Loading