Pratt example #622

39555 · 2024-11-17T20:57:26Z

An example of the usage of the Pratt parser for parsing a weird cexpr.

The result of the parsing is a nicely formatted ast and the expression in prefix notation.

1 + 2 + 3:

ADD
  ADD
    VAL 1
    VAL 2
  VAL 3

(+ (+ 1 2) 3)

Parser Problems

Parsing any complex postfix operator a ? b : c, foo(), foo(1 + 2), a[1 + 2] cannot be done without parsing it inside the operand parser (which means breaking the prefix precedence). Maybe the api should provide the input inside the operator closures. ~~But I'm having strange lifetime troubles with it for now~~

EDIT: I made it by switching all callbacks to function pointers fn() ->. I had completely forgotten about it. Now the input is a first argument in closures.
Maybe closures should return an error instead of plain value to allow validation of the input e.g dereferencing the literal 1->foo or handling unbalanced delimiter in complex postfixes.

EDIT: all closures now return PResult. Looks quite ugly..
Concept of neither associativity. ~~I don't know yet how it works but the parser could potentially reject a == b == c somehow~~.

EDIT: added Assoc::Neither and tests. This should fail: a == b == c, a < b < c.

Co-authored-by: Ed Page <[email protected]>

This feature was an overengineering based on suggestion "Why make our own trait" in winnow-rs#614 (comment)

works without it

…d be - based on review "Why allow non_snake_case?" in winnow-rs#614 (comment) - remove `allow_unused` based on "Whats getting unused?" winnow-rs#614 (comment)

until we find a satisfactory api based on winnow-rs#614 (comment) > "We are dumping a lot of stray types into combinator. The single-line summaries should make it very easy to tell they are related to precedence"

based on "Organizationally, O prefer the "top level" thing going first and then branching out from there. In this case, precedence is core." winnow-rs#614 (comment)

the api has an unsound problem. The `Parser` trait is implemented on the `&Operator` but inside `parse_next` a mutable ref and `ReffCell::borrow_mut` are used which can lead to potential problems. We can return to the API later. But for now lets keep only the essential algorithm and pass affix parsers as 3 separate entities Also add left_binding_power and right_binding_power to the operators based on winnow-rs#614 (comment)

I will write the documentation later

- require explicit `trace` for operators - fix associativity handling for infix operators: `1 + 2 + 3` should be `(1 + 2) + 3` and not `1 + (2 + 3)`

epage · 2024-11-18T17:59:29Z

examples/pratt/parser.rs

+                dispatch! {any;
+                    '!' => empty.value((20, (|_: &mut _, a| Ok(Expr::Fac(Box::new(a)))) as _)),
+                    '?' => empty.value((3, (|i: &mut &str, cond| {
+                        let (left, right) = preceded(multispace0, cut_err(separated_pair(pratt_parser, delimited(multispace0, ':', multispace0), pratt_parser))).parse_next(i)?;


Having to put multispace0s in here means that we have to successfully parse more of the input before we find that it doesn't match, hurting performance. I assume the way to handle this is to lex into tokens and then run this parser on tokens which will have the multi-space taken care of.

epage · 2024-11-18T18:01:06Z

examples/pratt/parser.rs

+                            "ge" => empty.value((12, 13, (|_: &mut _, a, b| Ok(Expr::GreaterEqual(Box::new(a), Box::new(b)))) as _)),
+                            "lt" => empty.value((12, 13, (|_: &mut _, a, b| Ok(Expr::Less(Box::new(a), Box::new(b)))) as _)),
+                            "le" => empty.value((12, 13, (|_: &mut _, a, b| Ok(Expr::LessEqual(Box::new(a), Box::new(b)))) as _)),
+                        _ => fail


Is this missing indentation?

Yes. Thanks. Fixed. It seems like rustfmt is having a hard time formatting macros.

epage · 2024-11-18T18:03:22Z

examples/pratt/parser.rs

+            // .parse("1 + 2 * *4^7! + 6")
+            .parse("foo(1 + 2 + 3) + bar() ? 1 : 2")
+            .unwrap();
+        println!("{r}");


With snapbox we can do snapshot testing of the .to_string()

I just pushed a lot of tests. There is another api complication thing that currently fails. When invoking a recursive parser in a ternary operator, it should know its starting precedence. Consider:
a ? b : c, d
it should parse as (, (? a b c) d). But currently it parses as (? a b (, c d)). The second part after : doesn't know it's part of the ternary operator. 2 solutions are:

Allow users to provide a starting binding power e.g precedence(0, ...). The user would call precedence(ternary_precedence+1, ...) inside the ternary operator.

Require users to rebuild child parsers excluding operators with precedence lower than the current one.

Another API option is for fn precedence(...) -> Precedence (instead of impl Parser) and have a Precedence::initial_power or whatever we want to call it. This would also be a violation of the API guidelines of trailing functions only affecting the return value.

I changed the parser to allow specifying the starting precedence. Now all the tests pass. I will add more error related tests.

When all the missing parts work, we will see the full interface and can consider the most user-friendly design

winnow-rs#622 (comment)

- ternary operator - function call - index

- fix failing tests related to the ternary operator and commas

winnow-rs#622 (comment)

39555 · 2024-11-19T09:25:38Z

Well, the example is complete and all the features are there! I assume with #618 it would be the same. I may check it later.

src/combinator/mod.rs

@@ -166,6 +166,8 @@
 mod parser;
 mod sequence;

+pub mod precedence;


src/combinator/precedence.rs

+}
+
+#[derive(Debug, Clone, Copy)]
+pub enum Assoc {


src/combinator/precedence.rs

+
+#[derive(Debug, Clone, Copy)]
+pub enum Assoc {
+    Left(i64),


src/combinator/precedence.rs

+#[derive(Debug, Clone, Copy)]
+pub enum Assoc {
+    Left(i64),
+    Right(i64),


src/combinator/precedence.rs

+pub enum Assoc {
+    Left(i64),
+    Right(i64),
+    Neither(i64),


epage · 2024-11-19T19:59:50Z

@39555 I have to say, I am thoroughly impressed with the dedication you have put to this investigation, having

Implemented the initial recursive version
Implemented an iterative version out of my concern for stackoverflows
Implemented a full C expression parser with it to see what feature are missing, adding features that I'm not seeing in other libraries that provide a generic "pratt" parser

39555 and others added 14 commits November 12, 2024 02:37

feat: implement Pratt parser

fed8c90

commit suggestion

ee4459d

Co-authored-by: Ed Page <[email protected]>

remove spaces from #[doc(alias = "...")]

4b1499d

remove UnaryOp and BinaryOp in favor of Fn

acf4577

This feature was an overengineering based on suggestion "Why make our own trait" in winnow-rs#614 (comment)

remove redundant trait impl

a816a1c

works without it

remove allow_unused, move allow(non_snake_case) to where it shoul…

2a80e65

…d be - based on review "Why allow non_snake_case?" in winnow-rs#614 (comment) - remove `allow_unused` based on "Whats getting unused?" winnow-rs#614 (comment)

stop dumping pratt into combinator namespace

29fe18d

until we find a satisfactory api based on winnow-rs#614 (comment) > "We are dumping a lot of stray types into combinator. The single-line summaries should make it very easy to tell they are related to precedence"

move important things to go first

5a4f4b4

based on "Organizationally, O prefer the "top level" thing going first and then branching out from there. In this case, precedence is core." winnow-rs#614 (comment)

remove wrong and long doc for now

0273a29

I will write the documentation later

fix: precedence for associativity, remove trace()

f218911

- require explicit `trace` for operators - fix associativity handling for infix operators: `1 + 2 + 3` should be `(1 + 2) + 3` and not `1 + (2 + 3)`

switch from &dyn Fn(O) -> O to fn(O) -> O

3d7ef41

feat: pass Input into operator closures

a6cbc1a

add trace for tests parser

29b64fa

39555 force-pushed the pratt-example branch from b9e799d to c3e18a8 Compare November 17, 2024 22:05

feat: operator closures must return PResult

b31a3a3

39555 force-pushed the pratt-example branch from c3e18a8 to 4d9f2dc Compare November 18, 2024 16:40

epage mentioned this pull request Nov 18, 2024

Pratt parsing support #131

Open

2 tasks

epage reviewed Nov 18, 2024

View reviewed changes

feat: allow the user to specify starting power

33c82f3

39555 force-pushed the pratt-example branch from d8c74b1 to 211f9de Compare November 18, 2024 20:37

39555 added a commit to 39555/winnow that referenced this pull request Nov 18, 2024

style: fix indentation

ca603d1

winnow-rs#622 (comment)

39555 added a commit to 39555/winnow that referenced this pull request Nov 18, 2024

refactor: remove unnecessarily multispace0

ee6c3b7

winnow-rs#622 (comment)

feat: enum Assoc for infix operators. Add Neither associativity

040dd85

39555 added a commit to 39555/winnow that referenced this pull request Nov 19, 2024

style: fix indentation

7868cda

winnow-rs#622 (comment)

39555 added a commit to 39555/winnow that referenced this pull request Nov 19, 2024

refactor: remove unnecessarily multispace0

70f5c6d

winnow-rs#622 (comment)

39555 force-pushed the pratt-example branch from 45317f9 to 5177b7d Compare November 19, 2024 08:37

39555 added a commit to 39555/winnow that referenced this pull request Nov 19, 2024

style: fix indentation

7672302

winnow-rs#622 (comment)

39555 added a commit to 39555/winnow that referenced this pull request Nov 19, 2024

refactor: remove unnecessarily multispace0

ba35a05

winnow-rs#622 (comment)

39555 force-pushed the pratt-example branch from 5177b7d to 425cd2d Compare November 19, 2024 08:49

39555 added 11 commits November 19, 2024 12:50

fix: switch to i64, fix precedence checking

6d88dff

example: pratt expression parser

8f18fc2

feat: complex postfix operators

a4ad844

- ternary operator - function call - index

pratt_example: operator closures return PResult

54cb315

test: add tests

d6da343

specify the parser start precedence

c1a8535

- fix failing tests related to the ternary operator and commas

style: fix indentation

a85291b

winnow-rs#622 (comment)

refactor: remove unnecessarily multispace0

39cc484

winnow-rs#622 (comment)

fix: failed tests

c52c10d

use Assoc enum. tests for associativity Neither

d3c3d0a

fix: switch to i64

b7b0629

39555 force-pushed the pratt-example branch from 425cd2d to b7b0629 Compare November 19, 2024 08:50

tests ill-formed expressions

5e7fb65

39555 force-pushed the pratt-example branch from dd7e44c to 5e7fb65 Compare November 19, 2024 11:19

github-advanced-security bot found potential problems Nov 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pratt example #622

Pratt example #622

39555 commented Nov 17, 2024 •

edited

Loading

epage Nov 18, 2024

epage Nov 18, 2024

39555 Nov 18, 2024

epage Nov 18, 2024

39555 Nov 18, 2024 •

edited

Loading

epage Nov 18, 2024

39555 Nov 18, 2024

39555 Nov 18, 2024

39555 commented Nov 19, 2024

epage commented Nov 19, 2024

Pratt example #622

Are you sure you want to change the base?

Pratt example #622

Conversation

39555 commented Nov 17, 2024 • edited Loading

Parser Problems

epage Nov 18, 2024

Choose a reason for hiding this comment

epage Nov 18, 2024

Choose a reason for hiding this comment

39555 Nov 18, 2024

Choose a reason for hiding this comment

epage Nov 18, 2024

Choose a reason for hiding this comment

39555 Nov 18, 2024 • edited Loading

Choose a reason for hiding this comment

epage Nov 18, 2024

Choose a reason for hiding this comment

39555 Nov 18, 2024

Choose a reason for hiding this comment

39555 Nov 18, 2024

Choose a reason for hiding this comment

39555 commented Nov 19, 2024

epage commented Nov 19, 2024

39555 commented Nov 17, 2024 •

edited

Loading

39555 Nov 18, 2024 •

edited

Loading