Note that raw string literals are context-sensitive #1185

Gankra · 2022-03-25T15:33:33Z

This was originally discussed with a full formal proof in

https://github.com/rust-lang/rust/blob/5187be620c76a313a19b9b596e1bce3a80a345dd/src/grammar/raw-string-literal-ambiguity.md

but that file was removed as part of the "legacy" grammar of Rust. Over the years the file has been linked many times and is genuinely non-obvious and useful to know (because it means any attempt to express Rust in entirely EBNF is futile). I don't think the full formal proof is actually necessary, we can just appeal to an example and mention the "context-free languages can't compare 3 numbers" rule. To minimize confusion, I'm also including a note that the practical implications are relatively minor for an actual practical parser.

(Full disclosure I am the original author of the proof and it was my first contribution to Rust, so I am 100% sentimental about it, but also, it is just genuinely useful information.)

See https://github.com/rust-lang/rust/blob/5187be620c76a313a19b9b596e1bce3a80a345dd/src/grammar/raw-string-literal-ambiguity.md for the original formal proof of this fact.

bjorn3 · 2022-03-25T15:38:31Z

A while ago I went looking for that exact document and couldn't find it. Maybe you could link to the proof in PR changes too?

Gankra · 2022-03-25T16:01:35Z

Sorry which link do you want where?

bjorn3 · 2022-03-25T16:04:18Z

https://github.com/rust-lang/rust/blob/5187be620c76a313a19b9b596e1bce3a80a345dd/src/grammar/raw-string-literal-ambiguity.md

src/tokens.md

Gankra · 2022-03-25T16:10:18Z

Done.

src/tokens.md

Gankra · 2022-03-25T16:25:44Z

(sorry doing this via github UI, feel free to squash it all)

ehuss · 2022-04-09T19:06:59Z

I'm curious, is this true even if you consider the number of # is limited? The discussions I've seen of this (and the linked proof) seem to assume that there is no limit. Couldn't one define a regular grammar with a fixed number of rules, one for each balanced count?

I would think that nested comment blocks (which have no limit AFAIK) would be a better example of Rust being non-regular.

petrochenkov · 2022-04-09T19:17:33Z

This is so irrelevant in practice that I'm not even sure it's something that needs to be highlighted in the docs.

The separation into lexer and parser in Rust is pretty strong, due to macros for example, so details like this do not even belong to the grammar, if we assume that grammar starts from the parser.

Gankra · 2022-04-09T19:19:43Z

It's an explicit problem for anyone who tries to implement a parser and tries to use tools "for" context-free-grammars, which is like, the standard (bad) tooling everyone recommends.

I don't see why comments would be problematic, they're just balanced parentheses, no?

bjorn3 · 2022-04-09T19:32:12Z

This is so irrelevant in practice that I'm not even sure it's something that needs to be highlighted in the docs.

It is relevant in that it proves you can't write a correct syntax highlighter using standard regexes like is common in most editors.

GrishaVar · 2022-04-25T19:28:28Z

I'm curious, is this true even if you consider the number of # is limited? The discussions I've seen of this (and the linked proof) seem to assume that there is no limit. Couldn't one define a regular grammar with a fixed number of rules, one for each balanced count?

Yes, though it isn't documented in the reference atm (#1180), the number of # is limited. The limit was recently reduced to 255. This should make the raw strings context-free, though the EBNF wouldn't look very good; I reckon you'd need 256 rules, one for each length.

@Gankra Maybe a note about that should be added to the proof?

traviscross · 2024-08-27T22:03:34Z

We reviewed this in the lang-docs call today. We're definitely sympathetic to the sentimentality, but probably agree that it doesn't need to be said here, so we'll close it.

Note that raw string literals are context-sensitive

9f40cee

See https://github.com/rust-lang/rust/blob/5187be620c76a313a19b9b596e1bce3a80a345dd/src/grammar/raw-string-literal-ambiguity.md for the original formal proof of this fact.

Gankra commented Mar 25, 2022

View reviewed changes

src/tokens.md Show resolved Hide resolved

Link the old context-sensitive proof.

53e0a8b

Gankra commented Mar 25, 2022

View reviewed changes

src/tokens.md Show resolved Hide resolved

fixup newline

05fd5e7

eddyb mentioned this pull request Aug 30, 2022

String interpolation by default leanprover/lean4#407

Open

traviscross closed this Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Note that raw string literals are context-sensitive #1185

Note that raw string literals are context-sensitive #1185

Gankra commented Mar 25, 2022

bjorn3 commented Mar 25, 2022

Gankra commented Mar 25, 2022

bjorn3 commented Mar 25, 2022

Gankra commented Mar 25, 2022

Gankra commented Mar 25, 2022

ehuss commented Apr 9, 2022

petrochenkov commented Apr 9, 2022

Gankra commented Apr 9, 2022

bjorn3 commented Apr 9, 2022

GrishaVar commented Apr 25, 2022

traviscross commented Aug 27, 2024

Note that raw string literals are context-sensitive #1185

Note that raw string literals are context-sensitive #1185

Conversation

Gankra commented Mar 25, 2022

bjorn3 commented Mar 25, 2022

Gankra commented Mar 25, 2022

bjorn3 commented Mar 25, 2022

Gankra commented Mar 25, 2022

Gankra commented Mar 25, 2022

ehuss commented Apr 9, 2022

petrochenkov commented Apr 9, 2022

Gankra commented Apr 9, 2022

bjorn3 commented Apr 9, 2022

GrishaVar commented Apr 25, 2022

traviscross commented Aug 27, 2024