-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Note that raw string literals are context-sensitive #1185
Conversation
See https://github.com/rust-lang/rust/blob/5187be620c76a313a19b9b596e1bce3a80a345dd/src/grammar/raw-string-literal-ambiguity.md for the original formal proof of this fact.
A while ago I went looking for that exact document and couldn't find it. Maybe you could link to the proof in PR changes too? |
Sorry which link do you want where? |
Done. |
(sorry doing this via github UI, feel free to squash it all) |
I'm curious, is this true even if you consider the number of I would think that nested comment blocks (which have no limit AFAIK) would be a better example of Rust being non-regular. |
This is so irrelevant in practice that I'm not even sure it's something that needs to be highlighted in the docs. The separation into lexer and parser in Rust is pretty strong, due to macros for example, so details like this do not even belong to the grammar, if we assume that grammar starts from the parser. |
It's an explicit problem for anyone who tries to implement a parser and tries to use tools "for" context-free-grammars, which is like, the standard (bad) tooling everyone recommends. I don't see why comments would be problematic, they're just balanced parentheses, no? |
It is relevant in that it proves you can't write a correct syntax highlighter using standard regexes like is common in most editors. |
Yes, though it isn't documented in the reference atm (#1180), the number of @Gankra Maybe a note about that should be added to the proof? |
We reviewed this in the lang-docs call today. We're definitely sympathetic to the sentimentality, but probably agree that it doesn't need to be said here, so we'll close it. |
This was originally discussed with a full formal proof in
https://github.com/rust-lang/rust/blob/5187be620c76a313a19b9b596e1bce3a80a345dd/src/grammar/raw-string-literal-ambiguity.md
but that file was removed as part of the "legacy" grammar of Rust. Over the years the file has been linked many times and is genuinely non-obvious and useful to know (because it means any attempt to express Rust in entirely EBNF is futile). I don't think the full formal proof is actually necessary, we can just appeal to an example and mention the "context-free languages can't compare 3 numbers" rule. To minimize confusion, I'm also including a note that the practical implications are relatively minor for an actual practical parser.
(Full disclosure I am the original author of the proof and it was my first contribution to Rust, so I am 100% sentimental about it, but also, it is just genuinely useful information.)