Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cucumber-expressions: Use a parser to parse Cucumber Expressions #771

Merged
merged 196 commits into from
Dec 10, 2020

Conversation

mpkorstanje
Copy link
Contributor

@mpkorstanje mpkorstanje commented Oct 24, 2019

Summary

A parser to parse Cucumber Expressions.

Cucumber Expressions currently uses a string rewriting algorithm
to turn Cucumber Expression Strings into regular expressions. This algorithm
has three major problems.

  1. Rewriting optionals interferes with rewriting alternatives as seen in Optional text not working as expected #767.
  2. It is possible to inject regex expressions into Cucumber Expressions as seen in Incorrect Match Step Definitions if Optional text past on Alternative text #601.
  3. It is not possible use alternatives containing a space.
  4. Escaping characters is generally inconsistent.

By using a parser to generate the AST of Cucumber expression and rewriting this
AST to a regular expression we can avoid all these problems. For example:

{int} blind\ rat(s)/mouse/mice

Matches:

  • 3 blind rats
  • 1 blind rat
  • 3 mice
  • 1 mouse

Fixes: #767
Fixes: #601
Fixes: #726
Closes: #770

Details

Grammar

cucumber-expression :=  ( alternation | optional | parameter | text )*
alternation := (?<=left-boundary) + alternative* + ( '/' + alternative* )+ + (?=right-boundary)
left-boundary := whitespace | } | ^
right-boundary := whitespace | { | $
alternative: = optional | parameter | text 
optional := '(' + option* + ')'
option := optional | parameter | text
parameter := '{' + name* + '}'
name := whitespace | .
text := whitespace | ')' | '}' | .

The AST is constructed from the following tokens:

escape := '\'
token := whitespace | '(' | ')' | '{' | '}' | '/' | .
. := any non-reserved codepoint

Note:

  • While parameter is allowed to appear as part of alternative and
    option in the AST, such an AST is not a valid a Cucumber Expression.
  • While optional is allowed to appear as part of option in the AST,
    such an AST is not a valid a Cucumber Expression.
  • ASTs with empty alternatives or alternatives that only
    contain an optional are valid ASTs but invalid Cucumber Expressions.
  • All escaped tokens (tokens starting with a backslash) are rewritten to their
    unescaped equivalent after parsing.

Types of changes

  • Bug fix (non-breaking change which fixes an issue).
  • New feature (non-breaking change which adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).

Checklist:

  • The change has been ported to Java.
  • The change has been ported to Ruby.
  • The change has been ported to JavaScript.
  • The change has been ported to Go.
  • I've added tests for my code.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@mpkorstanje mpkorstanje changed the title cucumber-expressions: Use pseudo token rewrite parsing algorithm cucumber-expressions: Use operator precedence parser Oct 24, 2019
@mpkorstanje mpkorstanje marked this pull request as ready for review October 27, 2019 12:25
@mpkorstanje mpkorstanje requested review from mxygem, charlierudolph, aslakhellesoy and luke-hill and removed request for mxygem October 27, 2019 12:25
@mpkorstanje
Copy link
Contributor Author

I've implemented the parser in Go and Java. I'm fairly proficient with both, before I continue with JavaScript and Ruby I'd like to have some feedback.

@mpkorstanje
Copy link
Contributor Author

Nevermind. Found the flaws.

Cases not yet covered:

aaa \( bbb (cccc)
aaa (bbb \)ccc)
aaa \(bbb\)  ccc

Ect.

Currently can't catch these because this parser rewrites tokens collected by regex rather implementing a proper tokenizer + LR(1) parser.

Copy link
Contributor

@luke-hill luke-hill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is all good e.t.c. but

Personal pref, I don't like this (4 escaping \), type of cuke exp. When you've got 4 escaping characters it's starting to get beyond the point that we should support (imo)

cucumber-expressions/README.md Outdated Show resolved Hide resolved
@mpkorstanje
Copy link
Contributor Author

Personal pref, I don't like this (4 escaping ), type of cuke exp. When you've got 4 escaping characters it's starting to get beyond the point that we should support (imo)

We support two escapes at most currently. But if the parser is implemented properly it doesn't matter.

@mpkorstanje mpkorstanje changed the title cucumber-expressions: Use operator precedence parser cucumber-expressions: Use LR parser Oct 31, 2019
@mpkorstanje mpkorstanje changed the title cucumber-expressions: Use LR parser cucumber-expressions: Use a parser to parse Cucumber Expressions Nov 1, 2019
@mpkorstanje
Copy link
Contributor Author

mpkorstanje commented Nov 1, 2019

Okay. I've changed the implementation to a regular recursive descent parser. While not strictly necessary it makes the implementation much cleaner and behaves like all the other parsers people are used too (esp w.r.t escapes).

Additionally I've changed the precedence order. Alternation is now the highest precedent operator. This means that any thing bordering a / without spaces is part of the alternation (it used to be bound by optionals, but that was broken).

To make cucumber expressions easier to validate I've also extended the grammar to allow parameters to be nested in optionals and alternatives and optionals to be nested in alternatives. These are not valid cucumber expressions and will throw an error.

For now I'm going to let this percolate a bit. I'll add some examples (for the docs) later on.

@laeubi
Copy link

laeubi commented Nov 3, 2019

@mpkorstanje thats great 👍
This is something we could definately use/require to support better IDE integration of cucumber (see cucumber/cucumber-jvm#1711) and will greatly improve the further feature development of cucumber-expressions.
As there are ots of changes I'm not completely sure but will it be possible to use acess the different parts of an expression via public API?
What would be required is

  • Get parameters/alternatives and there positions in the origial string
  • Get type of parameters
  • Get alternative values

For your example ({int} blind\ rat(s)/mouse/mice) I would expect the following data:

  • one parameter at (0, 5) of type int
  • one alternative at (13, 30) with values [rat, rats, mouse, mice]

Is this already possible? If not can you add it?

@mpkorstanje
Copy link
Contributor Author

Is this already possible? If not can you add it?

Feel free to send a PR against the tokenize-cucumber-expression branch. The java implementation is pretty much done.

You'll probably want to add start and end indexes to Token and then reuse that information in the Node class to work out the bounds. Once that works we can hammer out the public API for it.

@mpkorstanje
Copy link
Contributor Author

And that is the go implementation of the recursive descent parser done.

@aslakhellesoy
Copy link
Contributor

Great work @mpkorstanje! Looking forward to porting this to TypoeScript. Maybe this cleanup will make it easier to port to C# too? /cc @SabotageAndi @gasparnagy

@aslakhellesoy
Copy link
Contributor

Can't believe I just wrote TypoeScript!

@stale
Copy link

stale bot commented May 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs.

@mpkorstanje
Copy link
Contributor Author

mpkorstanje commented Nov 23, 2020

Design choices made:

Make the parser complain about ambiguous structures

Because cucumber expressions are an unknown language implicit behaviour will be harder to understand.

For example given: three (brown/black) mice could depending match either:

three brown/black mice
three mice

Or:

three mice
three brown mice
three black mice

The latter makes sense but cucumber expressions don't support nesting alternatives in an optional. To make sure we can add this in the future and to make sure users understand that this is not a valid Cucumber expression the parser will complain about this.

Note that three (brown/black) mice does actually not make sense to use. For it to match three mice there would have to be two spaces between the words. So supporting alternation in optionals in a friendly manner would either require imploding spaces or a more complex expression such as three( brown\ black) mice. At this point using a regex or duplicating the step definition variation is a better tool for the job.

But don't complain about shorthand

This however is perfectly fine:

three \(brown\) mice
three \(brown) mice

@mpkorstanje mpkorstanje force-pushed the tokenize-cucumber-expression branch from d06e728 to df464a3 Compare November 28, 2020 16:48
@mpkorstanje
Copy link
Contributor Author

mpkorstanje commented Nov 28, 2020

Implementation is done. Before releasing this it would be prudent to release the other unreleased fixes first. This release should result in a new major version because the semantics of Cucumber Expressions have subtly changed.

The documentation here would need a small update w.r.t to escaping: https://cucumber.io/docs/cucumber/cucumber-expressions/

It would also be prudent if some one who's not me would try out some rather interesting Cucumber expressions and see if it all makes sense.

end

def escape(s)
s.gsub(/%/, '%%')
.gsub(/\(/, '\\(')
.gsub(/\{/, '\\{')
.gsub(/{/, '\\{')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is { not a reserved character?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, but whether it needs to be escaped or not depends on the context. IDEA told me escaping was redundant here, and I trust it :-)

Copy link
Contributor

@aslakhellesoy aslakhellesoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a tremendous piece of work this is @mpkorstanje. Thanks a million.

Really looking forward to integrating this into Cucumbers.

@aslakhellesoy aslakhellesoy merged commit 3fa9134 into master Dec 10, 2020
@aslakhellesoy aslakhellesoy deleted the tokenize-cucumber-expression branch December 10, 2020 11:38
@laeubi
Copy link

laeubi commented Jan 15, 2021

@mpkorstanje is there any example how to use the parser "standalone"? e.g. I have an expression as a string and like to parse it withe the parser to get the different parts?

@mpkorstanje
Copy link
Contributor Author

Not really. I'd rather not expose that as an API. I'd have to polish the AST even more before I'm confident I won't change it after the first bug we find.

@laeubi
Copy link

laeubi commented Jan 15, 2021

@mpkorstanje thanks for the feedback, I would be happy even with a "EXPERIMENTAL" marked API just to test and maybe give some feedback on this. I think this would is a really great addition to support tools in not firing their own (incomplete or differing) parsing algorithms.

aslakhellesoy pushed a commit to cucumber/cucumber-expressions that referenced this pull request Sep 20, 2021
To make cucumber/common#771 possible we need both mono and berp. Currently mono isn't installed and berp is checked in as a binary executable. Both are less then ideal. 

Currently mono isn't supported on the same version of alpine as .NET Core SDK 2.2.2. To avoid messing around too much with APK which doesn't support mono yet I've changed the base image to `ubuntu:20.04`. This makes it easier to install mono.

Additionally we can now also use the node version manager to manage the node and npm versions. They'll be installed without the need for global permissions. This solves another long standing build headace.

The make file will now use `mono  /var/lib/berp/1.1.1/tools/net471/Berp.exe -g gherkin.berp` which is less out of the box then it was before, but the exact installation instructions for berp and mono can be found in the `Dockerfile`.
mattwynne pushed a commit to cucumber/tag-expressions that referenced this pull request Oct 1, 2021
To make cucumber/common#771 possible we need both mono and berp. Currently mono isn't installed and berp is checked in as a binary executable. Both are less then ideal. 

Currently mono isn't supported on the same version of alpine as .NET Core SDK 2.2.2. To avoid messing around too much with APK which doesn't support mono yet I've changed the base image to `ubuntu:20.04`. This makes it easier to install mono.

Additionally we can now also use the node version manager to manage the node and npm versions. They'll be installed without the need for global permissions. This solves another long standing build headace.

The make file will now use `mono  /var/lib/berp/1.1.1/tools/net471/Berp.exe -g gherkin.berp` which is less out of the box then it was before, but the exact installation instructions for berp and mono can be found in the `Dockerfile`.
aurelien-reeves pushed a commit to cucumber/demo-formatter that referenced this pull request Feb 14, 2022
To make cucumber/common#771 possible we need both mono and berp. Currently mono isn't installed and berp is checked in as a binary executable. Both are less then ideal. 

Currently mono isn't supported on the same version of alpine as .NET Core SDK 2.2.2. To avoid messing around too much with APK which doesn't support mono yet I've changed the base image to `ubuntu:20.04`. This makes it easier to install mono.

Additionally we can now also use the node version manager to manage the node and npm versions. They'll be installed without the need for global permissions. This solves another long standing build headace.

The make file will now use `mono  /var/lib/berp/1.1.1/tools/net471/Berp.exe -g gherkin.berp` which is less out of the box then it was before, but the exact installation instructions for berp and mono can be found in the `Dockerfile`.
davidjgoss pushed a commit to cucumber/json-formatter that referenced this pull request Apr 14, 2022
To make cucumber/common#771 possible we need both mono and berp. Currently mono isn't installed and berp is checked in as a binary executable. Both are less then ideal. 

Currently mono isn't supported on the same version of alpine as .NET Core SDK 2.2.2. To avoid messing around too much with APK which doesn't support mono yet I've changed the base image to `ubuntu:20.04`. This makes it easier to install mono.

Additionally we can now also use the node version manager to manage the node and npm versions. They'll be installed without the need for global permissions. This solves another long standing build headace.

The make file will now use `mono  /var/lib/berp/1.1.1/tools/net471/Berp.exe -g gherkin.berp` which is less out of the box then it was before, but the exact installation instructions for berp and mono can be found in the `Dockerfile`.
aurelien-reeves pushed a commit to cucumber/messages that referenced this pull request Jun 29, 2022
To make cucumber/common#771 possible we need both mono and berp. Currently mono isn't installed and berp is checked in as a binary executable. Both are less then ideal. 

Currently mono isn't supported on the same version of alpine as .NET Core SDK 2.2.2. To avoid messing around too much with APK which doesn't support mono yet I've changed the base image to `ubuntu:20.04`. This makes it easier to install mono.

Additionally we can now also use the node version manager to manage the node and npm versions. They'll be installed without the need for global permissions. This solves another long standing build headace.

The make file will now use `mono  /var/lib/berp/1.1.1/tools/net471/Berp.exe -g gherkin.berp` which is less out of the box then it was before, but the exact installation instructions for berp and mono can be found in the `Dockerfile`.
cukebot pushed a commit to cucumber/messages-go that referenced this pull request Sep 21, 2022
To make cucumber/common#771 possible we need both mono and berp. Currently mono isn't installed and berp is checked in as a binary executable. Both are less then ideal. 

Currently mono isn't supported on the same version of alpine as .NET Core SDK 2.2.2. To avoid messing around too much with APK which doesn't support mono yet I've changed the base image to `ubuntu:20.04`. This makes it easier to install mono.

Additionally we can now also use the node version manager to manage the node and npm versions. They'll be installed without the need for global permissions. This solves another long standing build headace.

The make file will now use `mono  /var/lib/berp/1.1.1/tools/net471/Berp.exe -g gherkin.berp` which is less out of the box then it was before, but the exact installation instructions for berp and mono can be found in the `Dockerfile`.
cukebot pushed a commit to cucumber/gherkin-go that referenced this pull request Nov 9, 2022
To make cucumber/common#771 possible we need both mono and berp. Currently mono isn't installed and berp is checked in as a binary executable. Both are less then ideal. 

Currently mono isn't supported on the same version of alpine as .NET Core SDK 2.2.2. To avoid messing around too much with APK which doesn't support mono yet I've changed the base image to `ubuntu:20.04`. This makes it easier to install mono.

Additionally we can now also use the node version manager to manage the node and npm versions. They'll be installed without the need for global permissions. This solves another long standing build headace.

The make file will now use `mono  /var/lib/berp/1.1.1/tools/net471/Berp.exe -g gherkin.berp` which is less out of the box then it was before, but the exact installation instructions for berp and mono can be found in the `Dockerfile`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants