Skip to content
This repository has been archived by the owner on May 28, 2019. It is now read-only.

Implement a Gherkin parser in C #73

Closed
thiblahute opened this issue Aug 30, 2015 · 23 comments
Closed

Implement a Gherkin parser in C #73

thiblahute opened this issue Aug 30, 2015 · 23 comments

Comments

@thiblahute
Copy link

I started implementing a Gherkin parser using the GLib and json-glib: gherkin C/glib implementation. It is currently able to generate the AST (in json) for simple scenarios without tags (yet), it supports feature/scenario/Step/tables for now. It is just the beginning but I am opening this task basically to know if you would concider that direction good for you?

Also, I started looking at the acceptance tests but it looks like it is just doing diff between expect results and actual results, but the json outputed by json-glib has minor difference in formating (name: value VS name : value for example) so diffing will just fail, am I missing somthing here?

@tooky
Copy link
Contributor

tooky commented Aug 30, 2015

The acceptance tests use jq to normalise the JSON so that formatting differences are taken care of.

e.g. https://github.com/cucumber/gherkin3/blob/master/ruby/Makefile#L32

@thiblahute
Copy link
Author

Thanks @tooky that works fine and reveal some errors with table handling. I will make sure all the dataset passes and propose a MR.

@thiblahute
Copy link
Author

Also I have this kind of issiues:

-          "keyword": "And ",
+          "keyword": "And",

Why would it requiere a space after the 'And' keyword?

@aslakhellesoy
Copy link
Contributor

The space is part of the keyword because some languages (like Chinese) don't use spaces between words.

@aslakhellesoy
Copy link
Contributor

I've taken a look at your code. It's a good start, but some substantial changes are needed before we can accept it into this project.

In order to make it easy for us to maintain the various implementations, it's important that they all follow the same design. Your code doesn't bear any resemblance to the existing implementations.

Please follow the guidelines for implementing a parser for a new language.

For a language like C I think it is important to have 0 dependencies, since this would make it easier to use in embedded systems. The library itself shouldn't have any runtime dependencies on JSON. The AST should just be a tree of C structs, and the bin/gherkin-generate-ast executable (which is only used for testing) could use a JSON library to turn those structs into JSON.

As for reading in the translations to configure the lexer, it would probably make sense to have a target in the Makefile that converts gherkin-languages.json into a format that the library can read at runtime without relying on JSON.

One last comment - would you be able to / willing to use the existing license for the contribution?

@aslakhellesoy
Copy link
Contributor

I'm sure Meson is very nice, but I think the build system should use only make. The simplicity of this project doesn't warrant the use of an uncommon build tool.

@thiblahute
Copy link
Author

In order to make it easy for us to maintain the various implementations, it's important that they all follow the same design. Your code doesn't bear any resemblance to the existing implementations.
Please follow the guidelines for implementing a parser for a new language.

I read that but was not sure what that means in practice. What design part do you think I should follow?

For a language like C I think it is important to have 0 dependencies, since this would make it easier to use in embedded systems. The library itself shouldn't have any runtime dependencies on JSON. The AST should just be a tree of C structs, and the bin/gherkin-generate-ast executable (which is only used for testing) could use a JSON library to turn those structs into JSON.

I am not sure I agree with that, I think that using the GLib GScanner module is the right thing to do, and on what embeded device does the GLib not run? Also a big advantage of using the GLib is that we can then introspect the library and have bindings for many languages very easily. I could use GNode to represent the AST and json-glib in bin/gherkin-generate-ast.

I'm sure Meson is very nice, but I think the build system should use only make. The simplicity of this project doesn't warrant the use of an uncommon build tool.

You mean I should give the copyright to Cucumber Ltd, Gaspar Nagy, TechTalk not sure it is fare in the case I am the author :)

I'm sure Meson is very nice, but I think the build system should use only make. The simplicity of this project doesn't warrant the use of an uncommon build tool.

I had the impression in java you had a makefile and another build system, would that be OK for you?

Thanks a lot for having a look at my WIP code.

@aslakhellesoy
Copy link
Contributor

What design part do you think I should follow?

In short, translate the existing code, using the same file names, function names, variable names etc. Of course there are idioms that are particular to each language, but the control flow should be the same.

This control flow is documented at a high level in README.md - it explains how the lexer is connected to the parser. For more details of what the "design" is - refer to an implementation in a language you're comfortable with. For instance, the building of the AST should be done with an "ast builder" that is connected to the Berp-generated parser. Currently your AST building logic looks like it is connected directly to the scanner, and there doesn't seem to be a parser at all.

I think that using the GLib GScanner module is the right thing to do.

I'm fine with that if you say so - I'm not very experienced with C.

Also a big advantage of using the GLib is that we can then introspect the library and have bindings for many languages very easily

You mean this could become a libgherkin that could be used in bindings for say, lua or PHP? That would be fantastic!

I could use GNode to represent the AST and json-glib in bin/gherkin-generate-ast.

That sounds like a good idea to me.

You mean I should give the copyright to Cucumber Ltd, Gaspar Nagy, TechTalk?

Sorry - I didn't mean to exclude you from the license. Could you give the copyright to Cucumber Ltd, Saunier Thibault - similar to how it's done for the Python implementation?

I had the impression in java you had a makefile and another build system, would that be OK for you?

The reason Java also has Maven is that it's the most common build system for Java, and not having it would make it difficult for many Java developers to contribute. We aim to use the "standard" build tool for each language, but front everything with make to make it easier to build them all in CI.

What's your motivation for adding Meson?

@thiblahute
Copy link
Author

In short, translate the existing code, using the same file names, function names, variable names etc. >Of course there are idioms that are particular to each language, but the control flow should be the same.

This control flow is documented at a high level in README.md - it explains how the lexer is connected to the parser. For more details of what the "design" is - refer to an implementation in a language you're comfortable with. For instance, the building of the AST should be done with an "ast builder" that is connected to the Berp-generated parser. Currently your AST building logic looks like it is connected directly to the scanner, and there doesn't seem to be a parser at all.

That all makes a lot of sense indeed, currently the code is not properly modularized so I will follow what you describe in there.

Also a big advantage of using the GLib is that we can then introspect the library and have bindings for many languages very easily

You mean this could become a libgherkin that could be used in bindings for say, lua or PHP? That would be fantastic!

More than that, we will have auto generated bindings for all the languages supported by Gobject Introspection (currently python, javascript, lua, experimental bindings for rust, c# (though it is not dynamic), vala...) and you can always write for other languages as well.

Sorry - I didn't mean to exclude you from the license. Could you give the copyright to Cucumber Ltd, Saunier Thibault - similar to how it's done for the Python implementation?

Sure.

The reason Java also has Maven is that it's the most common build system for Java, and not having it would make it difficult for many Java developers to contribute. We aim to use the "standard" build tool for each language, but front everything with make to make it easier to build them all in CI.
What's your motivation for adding Meson?

I am (starting to be) involved in the Meson project and it makes my life very easy to get the build system in place with multiplatform support. Using plain Makefile in C quickly becomes a mess.

@brasmusson
Copy link
Contributor

The advice of @aslakhellesoy to limit the number of dependencies is good. From time to time there are posts on the cukes mailing list from people having problem building Cucumber-cpp, which could turn out is because they are using an old version of Visual Studio, or a miss match between the used version of gcc and the used version of the boost library, or something. So adding a dependency should not be taken lightly.

I am not familiar with GLib or Meson, so I can't comment specifically on their suitability in this case.

I do not understand the reference to the license for the Python implementation. That license file (and all other license files for the current language implementations) are a copy of the license file of the cucumber/gherkin3 project (copied by a rule in the Makefile).

@thiblahute
Copy link
Author

I pushed a few new commit no my branch where I basically factored ou the parser from the scanner and I know have a AstBuilder.
My question is what are the benefits of generating the parser with berp, and how important is it for you to concider my branch?

@aslakhellesoy
Copy link
Contributor

I just realised Meson requires Python 3.4. If it ran on Python 2.7 (what's shipped on OS X) I might be ok with it, but having to install Python just to build this is too much of a burden I think. A lot of pain for little gain. Given the simplicity of this Gherkin3 I don't see why a non-standard build tool is needed.

@aslakhellesoy
Copy link
Contributor

@brasmusson I thought the Python license had your name in it, but now I realise every sub-project gets the license from the root folder.

Ideally I'd like the license to mention just the original creators of the project, and have a separate file listing contributors.

Any objections to that?

@thiblahute
Copy link
Author

I just realised Meson requires Python 3.4. If it ran on Python 2.7 (what's shipped on OS X) I might be ok with it, but having to install Python just to build this is too much of a burden I think. A lot of pain for little gain. Given the simplicity of this Gherkin3 I don't see why a non-standard build tool is needed.

Because I would like to have for example, the API self documented and have the library introspected, I could do that with autotools too, but tbh this would be the burden :)
Also having the build system properly check the dependence etc sounds like the normal thing to do, Makefiles are not thought for that.

Are you against having the thing just compiled with Makefile (if GLib is on the system and uglyly fail if it not the case) and have a proper modern build system for people who want to use the introspection and generate the doc?

@thiblahute
Copy link
Author

Ideally I'd like the license to mention just the original creators of the project, and have a separate file listing contributors.
Any objections to that?

I usually do not like to give away my copyright on the code I write (from scratch, in case of patches etc... it is normal).

@aslakhellesoy
Copy link
Contributor

@thiblahute

My question is what are the benefits of generating the parser with berp, and how important is it for you to concider my branch?

I don't think we'll accept a parser that isn't generated with Berp unless there are good reasons not to do it.

Gherkin3 is currently implemented in 7 languages (if we count c), and I wouldn't be surprised if that number increases to a dozen within a year or two. Very few programmers are intimate with all those languages. Therefore, in order to make it as easy as possible to refactor, fix bugs, add features and release packages it is essential that all implementations have a similar structure.

For example, I don't know go at all, and very little Python. Still, I have been able to fix bugs and refactor the go and python code simply because I know where to find stuff since they follow the same structure.

If one implementation looks completely different, this becomes a huge burden that will slow everything down.

Another reason I prefer all parsers to be generated by Berp is that they are more likely to behave the same that way, especially when they are given grammatically incorrect gherkin. I don't know if you've started working on error messages yet, but it's essential that all implementations report parse errors in the same way.

@aslakhellesoy
Copy link
Contributor

I usually do not like to give away my copyright on the code I write (from scratch, in case of patches etc... it is normal).

Fair enough - I'm happy to add you (and @brasmusson) to /LICENSE since you've both contributed a completely new parser for a new language.

@aslakhellesoy
Copy link
Contributor

Are you against having the thing just compiled with Makefile (if GLib is on the system and uglyly fail if it not the case) and have a proper modern build system for people who want to use the introspection and generate the doc?

That sounds like a good compromise. I don't mind the presence of an additional, better build system at all as long as it can still be built with a standard/mainstram one (make in this case).

@aslakhellesoy
Copy link
Contributor

Stupid question while I'm waiting for a Makefile: How can I install meson? I've tried this:

virtualenv -p python3 py3
./py3/bin/pip3 install meson

That just gives me:

Collecting meson
  Could not find a version that satisfies the requirement meson (from versions: )
No matching distribution found for meson

I wasn't able to find any meson docs that tell me how to install it with pip.

@aslakhellesoy
Copy link
Contributor

I figured out how to install it - looks like meson isn't on PyPI(?!). I just cloned the repo and ran python3 install_meson.py. So far so good.

Now I wanted to build the code. I guessed this is the magic commands:

mkdir build
meson . build

It complained about missing packages, so I guessed I had to do a brew install glib and brew install json-glib. I also had to export PKG_CONFIG_PATH=/usr/local/opt/libffi/lib/pkgconfig.

Is all of this correct? Could you please provide some guidelines in CONTRIBUTING.md? (See one of the other language/CONTRIBUTING.md files for the general structure.

@aslakhellesoy
Copy link
Contributor

Is it true that meson doesn't build anything - it just generates a build file (for ninja)? I tried this:

cd build
ninja 

[5/10] Compiling c object 'gherkin/gherkin-0.1@sha/gherkinastbuilder.c.o'
In file included from ../gherkin/gherkinastbuilder.c:1:
../gherkin/gherkindebug.h:5:19: warning: named variadic macros are a GNU extension [-Wvariadic-macros]
# define DEBUG(x,a...)                 G_STMT_START {  \
                  ^
In file included from ../gherkin/gherkinastbuilder.c:3:
../gherkin/gherkinastbuilder.h:14:22: warning: redefinition of typedef 'GherkinAstBuilder' is a C11 feature [-Wtypedef-redefinition]
G_DECLARE_FINAL_TYPE(GherkinAstBuilder, gherkin_ast_builder, GHERKIN, AST_BUILDER, GObject)
                     ^
/usr/local/Cellar/glib/2.44.1/include/glib-2.0/gobject/gtype.h:1390:35: note: expanded from macro 'G_DECLARE_FINAL_TYPE'
  typedef struct _##ModuleObjName ModuleObjName;                                                         \
                                  ^
../gherkin/gherkinastbuilder.h:10:35: note: previous definition is here
typedef struct _GherkinAstBuilder GherkinAstBuilder;
                                  ^
In file included from ../gherkin/gherkinastbuilder.c:1:
../gherkin/gherkindebug.h:7:33: warning: token pasting of ',' and __VA_ARGS__ is a GNU extension [-Wgnu-zero-variadic-macro-arguments]
    g_message (G_STRLOC ": " x, ##a);      \
                                ^
../gherkin/gherkindebug.h:7:33: warning: token pasting of ',' and __VA_ARGS__ is a GNU extension [-Wgnu-zero-variadic-macro-arguments]
../gherkin/gherkinastbuilder.c:47:16: warning: unused function '_current_node' [-Wunused-function]
static GNode * _current_node (GherkinAstBuilder *self)
               ^
5 warnings generated.
[5/10] Compiling c object 'bin/gherkin-generate-ast@exe/gherkin-generate-ast.c.o'
In file included from ../bin/gherkin-generate-ast.c:3:
../gherkin/gherkinparser.h:14:22: warning: redefinition of typedef 'GherkinParser' is a C11 feature [-Wtypedef-redefinition]
G_DECLARE_FINAL_TYPE(GherkinParser, gherkin_parser, GHERKIN, PARSER, GObject)
                     ^
/usr/local/Cellar/glib/2.44.1/include/glib-2.0/gobject/gtype.h:1390:35: note: expanded from macro 'G_DECLARE_FINAL_TYPE'
  typedef struct _##ModuleObjName ModuleObjName;                                                         \
                                  ^
../gherkin/gherkinparser.h:10:31: note: previous definition is here
typedef struct _GherkinParser GherkinParser;
                              ^
1 warning generated.
[5/10] Compiling c object 'gherkin/gherkin-0.1@sha/gherkinscanner.c.o'
In file included from ../gherkin/gherkinscanner.c:6:
../gherkin/gherkindebug.h:5:19: warning: named variadic macros are a GNU extension [-Wvariadic-macros]
# define DEBUG(x,a...)                 G_STMT_START {  \
                  ^
1 warning generated.
[5/10] Compiling c object 'gherkin/gherkin-0.1@sha/gherkinparser.c.o'
In file included from ../gherkin/gherkinparser.c:4:
../gherkin/gherkindebug.h:5:19: warning: named variadic macros are a GNU extension [-Wvariadic-macros]
# define DEBUG(x,a...)                 G_STMT_START {  \
                  ^
In file included from ../gherkin/gherkinparser.c:6:
../gherkin/gherkinparser.h:14:22: warning: redefinition of typedef 'GherkinParser' is a C11 feature [-Wtypedef-redefinition]
G_DECLARE_FINAL_TYPE(GherkinParser, gherkin_parser, GHERKIN, PARSER, GObject)
                     ^
/usr/local/Cellar/glib/2.44.1/include/glib-2.0/gobject/gtype.h:1390:35: note: expanded from macro 'G_DECLARE_FINAL_TYPE'
  typedef struct _##ModuleObjName ModuleObjName;                                                         \
                                  ^
../gherkin/gherkinparser.h:10:31: note: previous definition is here
typedef struct _GherkinParser GherkinParser;
                              ^
In file included from ../gherkin/gherkinparser.c:7:
../gherkin/gherkinastbuilder.h:14:22: warning: redefinition of typedef 'GherkinAstBuilder' is a C11 feature [-Wtypedef-redefinition]
G_DECLARE_FINAL_TYPE(GherkinAstBuilder, gherkin_ast_builder, GHERKIN, AST_BUILDER, GObject)
                     ^
/usr/local/Cellar/glib/2.44.1/include/glib-2.0/gobject/gtype.h:1390:35: note: expanded from macro 'G_DECLARE_FINAL_TYPE'
  typedef struct _##ModuleObjName ModuleObjName;                                                         \
                                  ^
../gherkin/gherkinastbuilder.h:10:35: note: previous definition is here
typedef struct _GherkinAstBuilder GherkinAstBuilder;
                                  ^
../gherkin/gherkinparser.c:405:1: warning: unused function '_finalize' [-Wunused-function]
_finalize (GObject * object)
^
4 warnings generated.
[9/10] 'Generating Gherkin-0.0.9.gir with a custom command.'
FAILED: 'g-ir-scanner' '../gherkin/gherkinscanner.c' '../gherkin/gherkinparser.c' '../gherkin/gherkinastbuilder.c' '../gherkin/gherkintypes.c' '../gherkin/gherkinscanner.h' '../gherkin/gherkinparser.h' '../gherkin/gherkinastbuilder.h' '../gherkin/gherkintypes.h' '-D_REENTRANT' '-I/usr/local/Cellar/libffi/3.0.13/lib/libffi-3.0.13/include' '-I/usr/local/Cellar/gobject-introspection/1.44.0/include/gobject-introspection-1.0' '-I/usr/local/Cellar/glib/2.44.1/include/glib-2.0' '-I/usr/local/Cellar/glib/2.44.1/lib/glib-2.0/include' '-I/usr/local/opt/gettext/include' '--no-libtool' '--namespace=Gherkin' '--nsversion=0.0.9' '--warn-all' '--output' 'gherkin/Gherkin-0.0.9.gir' '-I/Users/aslakhellesoy/git/cucumber/gherkin3-c/c/gherkin/' '-lgherkin-0.1' '-L/Users/aslakhellesoy/git/cucumber/gherkin3-c/c/build/gherkin' '--include=GLib-2.0' '--include=GObject-2.0' '--include=Json-1.0' '--add-include-path=gherkin/' '-L' '/Users/aslakhellesoy/git/cucumber/gherkin3-c/c/build/gherkin' '--library' 'gherkin-0.1'
g-ir-scanner: compile: cc -Wno-deprecated-declarations -D_REENTRANT -I/usr/local/Cellar/glib/2.44.1/include/glib-2.0 -I/usr/local/Cellar/glib/2.44.1/lib/glib-2.0/include -I/usr/local/Cellar/gettext/0.19.5.1/include -I/usr/local/Cellar/libffi/3.0.13/lib/libffi-3.0.13/include -I/usr/local/Cellar/gobject-introspection/1.44.0/include/gobject-introspection-1.0 -I/usr/local/Cellar/glib/2.44.1/include/glib-2.0 -I/usr/local/Cellar/glib/2.44.1/lib/glib-2.0/include -I/usr/local/Cellar/gettext/0.19.5.1/include -I/Users/aslakhellesoy/git/cucumber/gherkin3-c/c/gherkin -I/usr/local/Cellar/glib/2.44.1/include/gio-unix-2.0 -I/usr/local/Cellar/json-glib/1.0.4/include/json-glib-1.0 -I/usr/local/Cellar/glib/2.44.1/include/glib-2.0 -I/usr/local/Cellar/glib/2.44.1/lib/glib-2.0/include -I/usr/local/Cellar/gettext/0.19.5.1/include -c -o /Users/aslakhellesoy/git/cucumber/gherkin3-c/c/build/tmp-introspect4lTLKu/Gherkin-0.0.9.o /Users/aslakhellesoy/git/cucumber/gherkin3-c/c/build/tmp-introspect4lTLKu/Gherkin-0.0.9.c
g-ir-scanner: link: cc -o /Users/aslakhellesoy/git/cucumber/gherkin3-c/c/build/tmp-introspect4lTLKu/Gherkin-0.0.9 /Users/aslakhellesoy/git/cucumber/gherkin3-c/c/build/tmp-introspect4lTLKu/Gherkin-0.0.9.o -L. -Wl,-rpath=. -Wl,--no-as-needed -lgherkin-0.1 -lgherkin-0.1 -L/Users/aslakhellesoy/git/cucumber/gherkin3-c/c/build/gherkin -Wl,-rpath=/Users/aslakhellesoy/git/cucumber/gherkin3-c/c/build/gherkin -L/Users/aslakhellesoy/git/cucumber/gherkin3-c/c/build/gherkin -Wl,-rpath=/Users/aslakhellesoy/git/cucumber/gherkin3-c/c/build/gherkin -L/usr/local/Cellar/glib/2.44.1/lib -L/usr/local/opt/gettext/lib -lgio-2.0 -lgobject-2.0 -lgmodule-2.0 -lglib-2.0 -lintl
ld: unknown option: -rpath=.
clang: error: linker command failed with exit code 1 (use -v to see invocation)
linking of temporary binary failed: Command '['cc', '-o', '/Users/aslakhellesoy/git/cucumber/gherkin3-c/c/build/tmp-introspect4lTLKu/Gherkin-0.0.9', '/Users/aslakhellesoy/git/cucumber/gherkin3-c/c/build/tmp-introspect4lTLKu/Gherkin-0.0.9.o', '-L.', '-Wl,-rpath=.', '-Wl,--no-as-needed', '-lgherkin-0.1', '-lgherkin-0.1', '-L/Users/aslakhellesoy/git/cucumber/gherkin3-c/c/build/gherkin', '-Wl,-rpath=/Users/aslakhellesoy/git/cucumber/gherkin3-c/c/build/gherkin', '-L/Users/aslakhellesoy/git/cucumber/gherkin3-c/c/build/gherkin', '-Wl,-rpath=/Users/aslakhellesoy/git/cucumber/gherkin3-c/c/build/gherkin', '-L/usr/local/Cellar/glib/2.44.1/lib', '-L/usr/local/opt/gettext/lib', '-lgio-2.0', '-lgobject-2.0', '-lgmodule-2.0', '-lglib-2.0', '-lintl']' returned non-zero exit status 1
ninja: build stopped: subcommand failed.

@thiblahute
Copy link
Author

Concerning generating the parser with Berp, it makes sense, I will do it (haven't had time to had a look at how that works yet, I was just wondering what your standpoint was about it).

Is all of this correct? Could you please provide some guidelines in CONTRIBUTING.md? (See one of the other language/CONTRIBUTING.md files for the general structure.

I will provide an explanation about how to build with meson, sure.

Is it true that meson doesn't build anything - it just generates a build file (for ninja)? I tried this:

That is correct, and it is what most build system do (autotools, cmake, scans, meson...). Meson has the advantage of handling several backends (currently ninja is the most mature one, but there is also one for xcode and visual studio)

Sorry for the build failure, this is all very WIP :) I just fixed the warnings, but it looks like something is introspecting the lib with clang on OS X as it looks like on OS X the linker does not support setting -rpath adding an '='. I will have a look at that. For now you can disable the introspection (with my latest code) doing:

cd build
rm -R *
meson ../ -Ddisable-introspection=true
ninja

but tbh there is not much to see yet :)

@aslakhellesoy
Copy link
Contributor

Thanks for your understanding @thiblahute. I realise it might be a little daunting to have to use .NET/Mono to run the berp parser generator, but it's actually pretty easy to install mono on most OSes, and you won't have to learn any .NET, except for the slightly weird Razor template syntax.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants