Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pratt example #622

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from
Draft

Pratt example #622

wants to merge 29 commits into from

Conversation

39555
Copy link

@39555 39555 commented Nov 17, 2024

An example of the usage of the Pratt parser for parsing a weird cexpr.

The result of the parsing is a nicely formatted ast and the expression in prefix notation.

1 + 2 + 3:

ADD
  ADD
    VAL 1
    VAL 2
  VAL 3

(+ (+ 1 2) 3)

Parser Problems

  • Parsing any complex postfix operator a ? b : c, foo(), foo(1 + 2), a[1 + 2] cannot be done without parsing it inside the operand parser (which means breaking the prefix precedence). Maybe the api should provide the input inside the operator closures. But I'm having strange lifetime troubles with it for now

    EDIT: I made it by switching all callbacks to function pointers fn() ->. I had completely forgotten about it. Now the input is a first argument in closures.

  • Maybe closures should return an error instead of plain value to allow validation of the input e.g dereferencing the literal 1->foo or handling unbalanced delimiter in complex postfixes.

    EDIT: all closures now return PResult. Looks quite ugly..

  • Concept of neither associativity. I don't know yet how it works but the parser could potentially reject a == b == c somehow.

    EDIT: added Assoc::Neither and tests. This should fail: a == b == c, a < b < c.

39555 and others added 14 commits November 12, 2024 02:37
Co-authored-by: Ed Page <[email protected]>
This feature was an overengineering

based on suggestion "Why make our own trait" in
winnow-rs#614 (comment)
…d be

- based on review "Why allow non_snake_case?"

in winnow-rs#614 (comment)

- remove `allow_unused` based on "Whats getting unused?"
winnow-rs#614 (comment)
until we find a satisfactory api

based on
winnow-rs#614 (comment)

> "We are dumping a lot of stray types into combinator. The single-line
summaries should make it very easy to tell they are related to
precedence"
based on "Organizationally, O prefer the "top level" thing going first
and then branching out from there. In this case, precedence is core."

winnow-rs#614 (comment)
the api has an unsound problem. The `Parser` trait is implemented on the
`&Operator` but inside `parse_next` a mutable ref and
`ReffCell::borrow_mut` are used which can lead to potential problems.

We can return to the API later. But for now lets keep only the essential
algorithm and pass affix parsers as 3 separate entities

Also add left_binding_power and right_binding_power to the operators
based on
winnow-rs#614 (comment)
I will write the documentation later
- require explicit `trace` for operators
- fix associativity handling for infix operators:
`1 + 2 + 3` should be `(1 + 2) + 3` and not `1 + (2 + 3)`
@epage epage mentioned this pull request Nov 18, 2024
2 tasks
dispatch! {any;
'!' => empty.value((20, (|_: &mut _, a| Ok(Expr::Fac(Box::new(a)))) as _)),
'?' => empty.value((3, (|i: &mut &str, cond| {
let (left, right) = preceded(multispace0, cut_err(separated_pair(pratt_parser, delimited(multispace0, ':', multispace0), pratt_parser))).parse_next(i)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to put multispace0s in here means that we have to successfully parse more of the input before we find that it doesn't match, hurting performance. I assume the way to handle this is to lex into tokens and then run this parser on tokens which will have the multi-space taken care of.

"ge" => empty.value((12, 13, (|_: &mut _, a, b| Ok(Expr::GreaterEqual(Box::new(a), Box::new(b)))) as _)),
"lt" => empty.value((12, 13, (|_: &mut _, a, b| Ok(Expr::Less(Box::new(a), Box::new(b)))) as _)),
"le" => empty.value((12, 13, (|_: &mut _, a, b| Ok(Expr::LessEqual(Box::new(a), Box::new(b)))) as _)),
_ => fail
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this missing indentation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Thanks. Fixed. It seems like rustfmt is having a hard time formatting macros.

// .parse("1 + 2 * *4^7! + 6")
.parse("foo(1 + 2 + 3) + bar() ? 1 : 2")
.unwrap();
println!("{r}");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With snapbox we can do snapshot testing of the .to_string()

Copy link
Author

@39555 39555 Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a lot of tests. There is another api complication thing that currently fails. When invoking a recursive parser in a ternary operator, it should know its starting precedence. Consider:
a ? b : c, d
it should parse as (, (? a b c) d). But currently it parses as (? a b (, c d)). The second part after : doesn't know it's part of the ternary operator. 2 solutions are:

  • Allow users to provide a starting binding power e.g precedence(0, ...). The user would call precedence(ternary_precedence+1, ...) inside the ternary operator.
  • Require users to rebuild child parsers excluding operators with precedence lower than the current one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another API option is for fn precedence(...) -> Precedence (instead of impl Parser) and have a Precedence::initial_power or whatever we want to call it. This would also be a violation of the API guidelines of trailing functions only affecting the return value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the parser to allow specifying the starting precedence. Now all the tests pass. I will add more error related tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When all the missing parts work, we will see the full interface and can consider the most user-friendly design

39555 added a commit to 39555/winnow that referenced this pull request Nov 18, 2024
39555 added a commit to 39555/winnow that referenced this pull request Nov 18, 2024
39555 added a commit to 39555/winnow that referenced this pull request Nov 19, 2024
39555 added a commit to 39555/winnow that referenced this pull request Nov 19, 2024
39555 added a commit to 39555/winnow that referenced this pull request Nov 19, 2024
39555 added a commit to 39555/winnow that referenced this pull request Nov 19, 2024
@39555
Copy link
Author

39555 commented Nov 19, 2024

Well, the example is complete and all the features are there! I assume with #618 it would be the same. I may check it later.

@@ -166,6 +166,8 @@
mod parser;
mod sequence;

pub mod precedence;

Check failure

Code scanning / clippy

missing documentation for a module Error

missing documentation for a module
}

#[derive(Debug, Clone, Copy)]
pub enum Assoc {

Check failure

Code scanning / clippy

missing documentation for an enum Error

missing documentation for an enum

#[derive(Debug, Clone, Copy)]
pub enum Assoc {
Left(i64),

Check failure

Code scanning / clippy

missing documentation for a variant Error

missing documentation for a variant
#[derive(Debug, Clone, Copy)]
pub enum Assoc {
Left(i64),
Right(i64),

Check failure

Code scanning / clippy

missing documentation for a variant Error

missing documentation for a variant
pub enum Assoc {
Left(i64),
Right(i64),
Neither(i64),

Check failure

Code scanning / clippy

missing documentation for a variant Error

missing documentation for a variant
@epage
Copy link
Collaborator

epage commented Nov 19, 2024

@39555 I have to say, I am thoroughly impressed with the dedication you have put to this investigation, having

  • Implemented the initial recursive version
  • Implemented an iterative version out of my concern for stackoverflows
  • Implemented a full C expression parser with it to see what feature are missing, adding features that I'm not seeing in other libraries that provide a generic "pratt" parser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants