-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COBOL: "simple" crafted parser #2076
Conversation
Does Geany provide the way to choose one of CobolFree, CobolVariable, or Cobol(Fixed) parsers?
ObjectiveC and Matlab parsers use the selector mechanism. A selector chooses one of them for an input file having ".m" as its extension. I wonder which one is acceptable to Geany. A selector may be acceptable because Geany may not use the selector mechanism; the selector is run in a very early stage of parsing. |
Not ATM. Geany doesn't even have this parser yet :)
That probably makes sense for uctags, as it's basically the kind of options compilers have. However, how would that play with your other suggestion, auto-selection?
That would probably be nice, but I'm not sure how easy it is to properly discriminate those. Free format might be okayish as it's allowing code in columns 1-6, but actually in fixed format columns 1-6 are simply discarded so a silly programmer could put some code there that simply has no effect. Fixed vs. Variable is just whether content after the 72th (IIRC) column is ignored or not. Not sure whether that can be detected properly. However, the parser as it is now doesn't care much about stuff at the end of lines if it's not for continuation lines -- well, it does care if the interesting part of a line extends past the 72th column, but that's probably highly unusual. So yeah that'd be nice, but I'm not sure I can think of a solid algorithm for that.
Well, I don't think it's the right question here. So long as the chosen technique would work for ctags-as-a-library, it's fine. Before this PR there was no support for Free and Variable formats anyway, so it's acceptable for us if we don't get this right away. |
Could you give me more time to think about this topic? If there is a consistent and Geany-friendly way, I would like to merge three parsers into one. If there is no such way, at least we have to provide kinds tables for every three parsers. I assume that a kind table is not shared between parsers; each parser has its specific kind table (and role tables). Kinds of C and C++ parsers are synchronized but not shared. I don't care ctags to have too many parsers. Making different things different is rather better. |
BTW, I have a question about English. |
Sure, take the time you need. I wasn't sure myself as how to expose the three parsers, and kind of expected you'd tell me :)
You are right, that's a typo on my end. And I have no idea what a good wording for this would be, and I meant to use "crafted" simply because that's what I was seeing you use for this, and for a lack of a better idea I though it'd be clear in the uctags context at least. |
Thank you. I made the memorandum because I can use it as an example input when I make SQL parser as a guest parser run on COBOL parser as its host parser:-) |
@masatake I added the test cases you pointed to, thanks! |
MEMO, Ignore this. http://itdoc.hitachi.co.jp/manuals/3000/30003D0800/GD080489.HTM (In Japanese). |
Could you see #2079? |
19929ad
to
2d15899
Compare
Pushed 2 small fixups, nothing fancy (see the diff). |
This fixes support for COBOL symbols after the recent breakage of regex parsers, as well as introducing additional features and bug fixes. Also import some of the tests. universal-ctags/ctags#2076 Part of geany#2119.
It seems that KIND_NULL is not used anywhere. I have to remove it in the future. |
2d15899
to
23caab0
Compare
Codecov Report
@@ Coverage Diff @@
## master #2076 +/- ##
==========================================
+ Coverage 86.96% 86.99% +0.03%
==========================================
Files 190 190
Lines 40350 40445 +95
==========================================
+ Hits 35092 35187 +95
Misses 5258 5258
Continue to review full report at Codecov.
|
I'm focusing on this pull request. |
They were not reviewed and were generated using parser in current master. https://github.com/OpenCobolIDE/OpenCobolIDE/tree/master/test/testfiles
Don't use a real token-based parser because COBOL is too tricky to parse properly without full syntax coverage, so switch to a simpler line-based hybrid parser. This new version also fixes support for continuation lines.
They have special meaning with specific nesting properties.
It's compatible in this case, and improved coverage by including the variable format parser.
23caab0
to
4361bab
Compare
|
I'm very sorry to be late. |
|
Don't be, I was inactive for about one and a half years ;)
I don't know, I think it makes some sense from a CLI point of view as it's dialects of the same language, but I'm still unsure how it fits. Anyway, I believe this could be altered later, at least before next release. |
Here is a proposal for a replacement COBOL parser. Disclaimer: I don't know COBOL. But still, this parser seems fairly nicer:
If anybody is COBOL-literate, review of the test cases results is most welcome, as well as additional test cases, or any input.
For the story, the main reason why I wrote that is not a very good one: in the process of synchronizing Geany's tag parsing with U-CTags, we currently don't have the regex based infrastructure, but have 2 parsers that require it, including the COBOL one. So I decided to give rewriting the COBOL parser a shot, taking the opportunity to also get a little closer look at that language. While at it, I figured that it ought to be a significant improvement on the current one to be a reasonable change to merge here, so here we go.
PS: This includes a fully tokenizing parser in the history, but I decided to step back with a simpler line-based approach that fits COBOL not too badly, and is a lot less fooled by COBOL's trickyness.
I can leave it there for future inspiration if anybody wants, or I can squash it away, as preferred.