-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ambiguity in mangling grammar around type qualifiers #179
Comments
Both the fully-qualified and unqualified types need to introduce substitution candidates, but we shouldn't do it twice for unqualified types. I'm hesitant to do some major refactor of the tree, but if you can find a clean solution, I'm open to taking it. |
Hmm, yes, that is awkward to specify in standard CFG notation, because you really want to define If you defined
and then designate But in plain CFG notation it's pretty awkward to write
and, really, yuck! |
Yeah. Unfortunately, I think this is a situation where creating a formally correct and unambiguous grammar is actually in tension with clarity for human readers, which seems like the more important goal. I'm inclined to just close this, but I can leave it open if you'd like to continue thinking about it for a while. |
No argument there – of course I agree that clarity for humans is more important. But it would be nice if we could also have a reference parser that verifiably matches the spec. Would you be open to a compromise in which the grammar is refactored as in the first snippet above, but with the RHS reference to
That loses the really ugly part, which is the cumbersome redefinition of I don't know that you could make an LR parser generator accept a grammar in this unusual representation (even if you wrote a custom one), but I do know that the simpler and more brute-force Earley algorithm would cope with it fine. Perhaps that would still be adequately clear to humans, while also permitting at least one parsing technology to handle the grammar? |
Yeah, I think a note saying that |
I'd prefer just "non-empty", and to handle "maximal" by reorganising the grammar to ensure it can't generate two adjacent |
Do you have a suggestion of how to do that which doesn't look awful? |
I thought the suggestion in my last-but-one comment was reasonably minimal, just rearranging the top-level rule or two for (It does mean there's no formal representation of that non-emptiness requirement. But I don't think there'd be any clean way to do that unless you introduced some kind of exciting new syntax for grammar rules, like "lhs ::= rhs1 AND NOT rhs2"!) |
How about we stop trying to enforce qualifier ordering in the grammar? We already need non-grammar ordering rules for extended qualifiers, and repeat the ordering rules from the grammar in that same description:
Then:
|
I'm happy with that. |
So am I. I'm sure that my Earley reference parser idea can be adapted easily to implement the ordering constraint (in the same way I could also have adapted it to handle the " |
I've been playing around recently with feeding this ABI's mangled-name grammar to an Earley parser, with the aim of using it as a reference decoder for mangled names, in cases where a production demangler is misbehaving or two demanglers disagree on the interpretation of a name.
This exercise caused me to spot a couple of bugs in the grammar. One is already reported here (#120) so I won't go into that one further. The other is a cycle in the grammar, related to type qualifiers. The following productions exist in the grammar:
And
<qualifiers>
can derive the empty string, because it expands to a series of things each of which is optional (any number of extended-qualifiers including 0, then optionalr
,V
andK
in that order).Therefore, a derivation can contain the sequence
<type>
→<qualified-type>
→<qualifiers> <type>
, and<qualifiers>
can be empty, which leads you back to<type>
matching exactly the same sequence of tokens.This is a problem for algorithmic parsing, because there are multiple formal parse trees for the same input, each going round the same pointless cycle a different number of times (although it's a benign ambiguity, since all those parse trees describe the same semantics).
But this cycle in the grammar also causes some unwanted things to be legal, because going round the cycle more than once permits more than one
<qualifiers>
nonterminal to appear before<type>
. For example, my test parser acceptsPKVi
as a valid type, by going round the cycle twice, with the outer<qualified-type>
consuming theK
and the inner one consuming theV
. The text alongside these grammar productions suggests that that was not intentional, and thatPVKi
is the only correct description of a pointer to volatile const int.I think the following redesign eliminates the ambiguity, causing the formal grammar to reflect the intent expressed by the text:
This structure forces exactly one instance of
<qualifiers>
to appear in a<type>
, eliminating the ambiguity and the unwanted reorderings. (But we keep the property that<qualifiers>
can be empty, which allows plaini
and so on to still work.)(However, I haven't checked what this change does to the question of what things are substitution candidates.)
The text was updated successfully, but these errors were encountered: