-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pratt example #622
base: main
Are you sure you want to change the base?
Pratt example #622
Conversation
Co-authored-by: Ed Page <[email protected]>
This feature was an overengineering based on suggestion "Why make our own trait" in winnow-rs#614 (comment)
works without it
…d be - based on review "Why allow non_snake_case?" in winnow-rs#614 (comment) - remove `allow_unused` based on "Whats getting unused?" winnow-rs#614 (comment)
until we find a satisfactory api based on winnow-rs#614 (comment) > "We are dumping a lot of stray types into combinator. The single-line summaries should make it very easy to tell they are related to precedence"
based on "Organizationally, O prefer the "top level" thing going first and then branching out from there. In this case, precedence is core." winnow-rs#614 (comment)
the api has an unsound problem. The `Parser` trait is implemented on the `&Operator` but inside `parse_next` a mutable ref and `ReffCell::borrow_mut` are used which can lead to potential problems. We can return to the API later. But for now lets keep only the essential algorithm and pass affix parsers as 3 separate entities Also add left_binding_power and right_binding_power to the operators based on winnow-rs#614 (comment)
I will write the documentation later
- require explicit `trace` for operators - fix associativity handling for infix operators: `1 + 2 + 3` should be `(1 + 2) + 3` and not `1 + (2 + 3)`
examples/pratt/parser.rs
Outdated
dispatch! {any; | ||
'!' => empty.value((20, (|_: &mut _, a| Ok(Expr::Fac(Box::new(a)))) as _)), | ||
'?' => empty.value((3, (|i: &mut &str, cond| { | ||
let (left, right) = preceded(multispace0, cut_err(separated_pair(pratt_parser, delimited(multispace0, ':', multispace0), pratt_parser))).parse_next(i)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having to put multispace0
s in here means that we have to successfully parse more of the input before we find that it doesn't match, hurting performance. I assume the way to handle this is to lex into tokens and then run this parser on tokens which will have the multi-space taken care of.
"ge" => empty.value((12, 13, (|_: &mut _, a, b| Ok(Expr::GreaterEqual(Box::new(a), Box::new(b)))) as _)), | ||
"lt" => empty.value((12, 13, (|_: &mut _, a, b| Ok(Expr::Less(Box::new(a), Box::new(b)))) as _)), | ||
"le" => empty.value((12, 13, (|_: &mut _, a, b| Ok(Expr::LessEqual(Box::new(a), Box::new(b)))) as _)), | ||
_ => fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this missing indentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Thanks. Fixed. It seems like rustfmt is having a hard time formatting macros.
examples/pratt/parser.rs
Outdated
// .parse("1 + 2 * *4^7! + 6") | ||
.parse("foo(1 + 2 + 3) + bar() ? 1 : 2") | ||
.unwrap(); | ||
println!("{r}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With snapbox
we can do snapshot testing of the .to_string()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed a lot of tests. There is another api complication thing that currently fails. When invoking a recursive parser in a ternary operator, it should know its starting precedence. Consider:
a ? b : c, d
it should parse as (, (? a b c) d)
. But currently it parses as (? a b (, c d))
. The second part after :
doesn't know it's part of the ternary operator. 2 solutions are:
- Allow users to provide a starting binding power e.g
precedence(0, ...)
. The user would callprecedence(ternary_precedence+1, ...)
inside the ternary operator. - Require users to rebuild child parsers excluding operators with precedence lower than the current one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another API option is for fn precedence(...) -> Precedence
(instead of impl Parser
) and have a Precedence::initial_power
or whatever we want to call it. This would also be a violation of the API guidelines of trailing functions only affecting the return value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the parser to allow specifying the starting precedence. Now all the tests pass. I will add more error related tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When all the missing parts work, we will see the full interface and can consider the most user-friendly design
- ternary operator - function call - index
- fix failing tests related to the ternary operator and commas
Well, the example is complete and all the features are there! I assume with #618 it would be the same. I may check it later. |
@@ -166,6 +166,8 @@ | |||
mod parser; | |||
mod sequence; | |||
|
|||
pub mod precedence; |
Check failure
Code scanning / clippy
missing documentation for a module Error
} | ||
|
||
#[derive(Debug, Clone, Copy)] | ||
pub enum Assoc { |
Check failure
Code scanning / clippy
missing documentation for an enum Error
|
||
#[derive(Debug, Clone, Copy)] | ||
pub enum Assoc { | ||
Left(i64), |
Check failure
Code scanning / clippy
missing documentation for a variant Error
#[derive(Debug, Clone, Copy)] | ||
pub enum Assoc { | ||
Left(i64), | ||
Right(i64), |
Check failure
Code scanning / clippy
missing documentation for a variant Error
pub enum Assoc { | ||
Left(i64), | ||
Right(i64), | ||
Neither(i64), |
Check failure
Code scanning / clippy
missing documentation for a variant Error
@39555 I have to say, I am thoroughly impressed with the dedication you have put to this investigation, having
|
An example of the usage of the
Pratt
parser for parsing a weird cexpr.The result of the parsing is a nicely formatted ast and the expression in prefix notation.
1 + 2 + 3
:Parser Problems
Parsing any complex postfix operator
a ? b : c
,foo()
,foo(1 + 2)
,a[1 + 2]
cannot be done without parsing it inside theoperand
parser (which means breaking the prefix precedence). Maybe the api should provide the input inside the operator closures.But I'm having strange lifetime troubles with it for nowEDIT: I made it by switching all callbacks to function pointers
fn() ->
. I had completely forgotten about it. Now the input is a first argument in closures.Maybe closures should return an error instead of plain value to allow validation of the input e.g dereferencing the literal
1->foo
or handling unbalanced delimiter in complex postfixes.EDIT: all closures now return
PResult
. Looks quite ugly..Concept of
neither
associativity.I don't know yet how it works but the parser could potentially reject.a == b == c
somehowEDIT: added
Assoc::Neither
and tests. This should fail:a == b == c
,a < b < c
.