Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Note that raw string literals are context-sensitive #1185

Closed
wants to merge 3 commits into from

Conversation

Gankra
Copy link
Contributor

@Gankra Gankra commented Mar 25, 2022

This was originally discussed with a full formal proof in

https://github.com/rust-lang/rust/blob/5187be620c76a313a19b9b596e1bce3a80a345dd/src/grammar/raw-string-literal-ambiguity.md

but that file was removed as part of the "legacy" grammar of Rust. Over the years the file has been linked many times and is genuinely non-obvious and useful to know (because it means any attempt to express Rust in entirely EBNF is futile). I don't think the full formal proof is actually necessary, we can just appeal to an example and mention the "context-free languages can't compare 3 numbers" rule. To minimize confusion, I'm also including a note that the practical implications are relatively minor for an actual practical parser.

(Full disclosure I am the original author of the proof and it was my first contribution to Rust, so I am 100% sentimental about it, but also, it is just genuinely useful information.)

@bjorn3
Copy link
Member

bjorn3 commented Mar 25, 2022

A while ago I went looking for that exact document and couldn't find it. Maybe you could link to the proof in PR changes too?

@Gankra
Copy link
Contributor Author

Gankra commented Mar 25, 2022

Sorry which link do you want where?

src/tokens.md Show resolved Hide resolved
@Gankra
Copy link
Contributor Author

Gankra commented Mar 25, 2022

Done.

src/tokens.md Show resolved Hide resolved
@Gankra
Copy link
Contributor Author

Gankra commented Mar 25, 2022

(sorry doing this via github UI, feel free to squash it all)

@ehuss
Copy link
Contributor

ehuss commented Apr 9, 2022

I'm curious, is this true even if you consider the number of # is limited? The discussions I've seen of this (and the linked proof) seem to assume that there is no limit. Couldn't one define a regular grammar with a fixed number of rules, one for each balanced count?

I would think that nested comment blocks (which have no limit AFAIK) would be a better example of Rust being non-regular.

@petrochenkov
Copy link
Contributor

This is so irrelevant in practice that I'm not even sure it's something that needs to be highlighted in the docs.

The separation into lexer and parser in Rust is pretty strong, due to macros for example, so details like this do not even belong to the grammar, if we assume that grammar starts from the parser.

@Gankra
Copy link
Contributor Author

Gankra commented Apr 9, 2022

It's an explicit problem for anyone who tries to implement a parser and tries to use tools "for" context-free-grammars, which is like, the standard (bad) tooling everyone recommends.

I don't see why comments would be problematic, they're just balanced parentheses, no?

@bjorn3
Copy link
Member

bjorn3 commented Apr 9, 2022

This is so irrelevant in practice that I'm not even sure it's something that needs to be highlighted in the docs.

It is relevant in that it proves you can't write a correct syntax highlighter using standard regexes like is common in most editors.

@GrishaVar
Copy link
Contributor

I'm curious, is this true even if you consider the number of # is limited? The discussions I've seen of this (and the linked proof) seem to assume that there is no limit. Couldn't one define a regular grammar with a fixed number of rules, one for each balanced count?

Yes, though it isn't documented in the reference atm (#1180), the number of # is limited. The limit was recently reduced to 255. This should make the raw strings context-free, though the EBNF wouldn't look very good; I reckon you'd need 256 rules, one for each length.

@Gankra Maybe a note about that should be added to the proof?

@traviscross
Copy link
Contributor

We reviewed this in the lang-docs call today. We're definitely sympathetic to the sentimentality, but probably agree that it doesn't need to be said here, so we'll close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants