Add check for possible CStr literals in pre-2021 #118691

chfogelman · 2023-12-07T01:17:40Z

Adds information to errors caused by possible CStr literals in pre-2021.

The lexer separates c"str" into two tokens if the edition is less than 2021, which later causes an error when parsing. This error now has a more helpful message that directs them to information about editions. However, the user might also have written c "str" in a later edition, so to not confuse people who are using a recent edition, I also added a note about whitespace.

We could probably figure out exactly which scenario has been encountered by examining spans and editions, but I figured it would be better not to overcomplicate the creation of the error too much.

This is my first code PR and I tried to follow existing conventions as much as possible, but I probably missed something, so let me know!

rustbot · 2023-12-07T01:17:48Z

r? @b-naber

(rustbot has picked a reviewer for you, use r? to override)

fmease · 2023-12-07T11:40:54Z

compiler/rustc_parse/src/parser/diagnostics.rs

@@ -640,6 +640,17 @@ impl<'a> Parser<'a> {
            }
        }

+        // extra info for `c"str"` before 2021 edition or `c "str"` in all editions
+        if self.prev_token.is_ident_named(Symbol::intern("c"))


Suggested change

if self.prev_token.is_ident_named(Symbol::intern("c"))

if self.prev_token.is_ident_named(sym::c)

You probably also want to check for cr (as in cr"", raw c-string literals). If I'm not mistaken “sym::c && StrRaw” only accounts for c r"" (notice the space).

fmease · 2023-12-07T11:45:18Z

compiler/rustc_parse/src/parser/diagnostics.rs

+        // extra info for `c"str"` before 2021 edition or `c "str"` in all editions
+        if self.prev_token.is_ident_named(Symbol::intern("c"))
+            && let TokenKind::Literal(token::Lit { kind: token::Str | token::StrRaw(..), .. }) =
+                &self.token.kind


You could check if the c immediately precedes the string literal via self.prev_token.span.hi() == self.token.span.lo() to rule out the false positives like c "" (notice the space) since I don't know how helpful the c-str notes are for c "". For comparison, we don't provide any special diagnostics for b "" and r "".

While you noted

We could probably figure out exactly which scenario has been encountered by examining spans and editions, but I figured it would be better not to overcomplicate the creation of the error too much.

it should be as simple as self.prev_token.span.eq_ctxt(self.token.span) && self.token.span.edition < Edition::Edition2021.

The eq_ctxt check prevents the notes about cstr literals from misfiring on:

// edition: <2021 macro_rules! m { ($x:ident) => { $x"" } } fn main() { m!(c) }

Your current implementation fires here (I think, unless I'm misremembering and we have an uninterpolated NtIdent 🤔).

I'm fixing this up, but using this macro as an example, self.prev_token.span.eq_ctxt(self.token.span) == true. The previous token's span appears to be that of the $x token in the macro expansion body, rather than the c argument to the macro, so they have the same context. Is this correct, or am I missing something?

fmease · 2023-12-07T11:57:50Z

compiler/rustc_parse/src/parser/diagnostics.rs

+                &self.token.kind
+        {
+            err.note("you may be trying to declare a CStr literal");
+            err.note("`CStr` literals require edition 2021");


Suggested change

err.note("`CStr` literals require edition 2021");

err.note("`c-string literals require at least edition 2021");

fmease · 2023-12-07T11:59:26Z

compiler/rustc_parse/src/parser/diagnostics.rs

+            && let TokenKind::Literal(token::Lit { kind: token::Str | token::StrRaw(..), .. }) =
+                &self.token.kind
+        {
+            err.note("you may be trying to declare a CStr literal");


You could think about making this diagnostic translatable. https://rustc-dev-guide.rust-lang.org/diagnostics/translation.html#writing-translatable-diagnostics

You could also think about moving this check above the creation of err or alternatively .cancel() the err and provide a specialized error message instead of expected one of [symbol soup] like proposed in #118654: c-string literals are not supported in Rust 2018 and older.

chfogelman · 2023-12-08T05:40:36Z

I've gone ahead and removed the ambiguity with whitespace, leaving that to just be a standard "expected one of" errors for consistency with b"str" and others. I added some additional checks to ensure that what we have is really intended to be a cstr literal, and also to pick up raw cstr literals and eliminate false positives in macro expansion. And I changed the error message itself to remove the operator soup and just say that the string token itself was unexpected.

This could still misfire on the characters cr# if they are not the start of a c-string, but I'm not sure that's worth trying to get perfect. At this point, the lexer has already turned this into the tokens [cr, #,...], so this would essentially require us to re-lex the rest of the statement to be 100% sure we're looking at something that would be a cstr literal in later editions.

fmease · 2023-12-14T15:05:40Z

r? fmease

fmease

Thanks, that's great! Sorry for taking so long. Your code looks good, I just have some nitpicks about the comments and the diagnostics.

Lastly, could you squash your commits in one? After that, it should be ready to go!

fmease · 2023-12-19T18:26:27Z

compiler/rustc_parse/src/parser/diagnostics.rs

@@ -640,6 +640,34 @@ impl<'a> Parser<'a> {
            }
        }

+        // Extra info for `c"str"` before 2021 edition or `c "str"` in all editions. The heuristic


Suggested change

// Extra info for `c"str"` before 2021 edition or `c "str"` in all editions. The heuristic

// Try to detect an attempt by the user to write a c-string literal before the 2021 edition. The heuristic

or something like that (it gives a better summary imo). Note that your comment still mentions c "str" (with a space) which is no longer accurate.

fmease · 2023-12-19T18:35:28Z

compiler/rustc_parse/src/parser/diagnostics.rs

+        // edition where c-string literals are not allowed. There is the slight possibility of a
+        // false positive for a `cr#` that wasn't intended to start a c-string literal, but the
+        // lexer was greedy and didn't preserve whether the `r#` on its own would have started a
+        // valid raw string literal.


This sentence is fine but you could also think about expanding upon it with something along the lines of “and we don't want to perform unbounded lookahead here to check if we have a sequence of hashes followed by a string literal”.

fmease · 2023-12-19T18:41:06Z

compiler/rustc_parse/src/parser/diagnostics.rs

+            && self.prev_token.span.hi() == self.token.span.lo()
+            && !self.token.span.at_least_rust_2021()
+        {
+            err.cancel();


Actually, since this subdiagnostic can have false positives, it might be wiser to keep the “token soup” error, not sure. When I suggested canceling the error, I assumed the diagnostic would be 100% accurate and we could replace the main message of the diagnostic similar to error[E0670]: `async fn` is not permitted in Rust 2015.

Your call though. If you'd like to keep the shorter message, I'd go with unexpected token (dropping the found) and with unexpected token over found here since the current phrasing isn't used in any other of rustc's error messages or only very rarely regarding the latter.

fmease · 2023-12-19T18:44:41Z

compiler/rustc_parse/src/parser/diagnostics.rs

+            let mut err =
+                self.struct_span_err(self.token.span, format!("found unexpected token {descr}"));
+            err.span_label(self.token.span, "found here".to_owned());
+            err.note("you may be trying to declare a c-string literal");


Suggested change

err.note("you may be trying to declare a c-string literal");

err.note("you might have meant to write a c-string literal");

Not a native speaker but the word declare doesn't feel right to me in this context.

fmease · 2023-12-19T18:48:22Z

compiler/rustc_parse/src/parser/diagnostics.rs

+                self.struct_span_err(self.token.span, format!("found unexpected token {descr}"));
+            err.span_label(self.token.span, "found here".to_owned());
+            err.note("you may be trying to declare a c-string literal");
+            err.note("c-string literals require edition 2021 or later");


Suggested change

err.note("c-string literals require edition 2021 or later");

err.note("c-string literals require the 2021 edition or later");

Suggested change

err.note("c-string literals require edition 2021 or later");

err.note("c-string literals require Rust 2021 or later");

(minor) Preexisting diagnostics always seem to be using one of the suggested expressions.

chfogelman · 2023-12-19T21:35:41Z

Okay, reverted to token soup because I agree it's more appropriate for this case; we really just want a hint here, not a whole new error (also saves me the trouble of being too creative with error messages). Also changed declare->write, and edition->Rust, but left the other phrasing as-is. And sqashed. Thanks for the reviews!

fmease · 2023-12-19T22:18:53Z

Thanks! :)

@bors r+ rollup

bors · 2023-12-19T22:18:55Z

📌 Commit 2c96025 has been approved by fmease

It is now in the queue for this repository.

…mease Add check for possible CStr literals in pre-2021 Fixes [rust-lang#118654](rust-lang#118654) Adds information to errors caused by possible CStr literals in pre-2021. The lexer separates `c"str"` into two tokens if the edition is less than 2021, which later causes an error when parsing. This error now has a more helpful message that directs them to information about editions. However, the user might also have written `c "str"` in a later edition, so to not confuse people who _are_ using a recent edition, I also added a note about whitespace. We could probably figure out exactly which scenario has been encountered by examining spans and editions, but I figured it would be better not to overcomplicate the creation of the error too much. This is my first code PR and I tried to follow existing conventions as much as possible, but I probably missed something, so let me know!

…mpiler-errors Rollup of 7 pull requests Successful merges: - rust-lang#118691 (Add check for possible CStr literals in pre-2021) - rust-lang#118973 (rustc_codegen_ssa: Don't drop `IncorrectCguReuseType` , make `rustc_expected_cgu_reuse` attr work) - rust-lang#119071 (-Znext-solver: adapt overflow rules to avoid breakage) - rust-lang#119089 (effects: fix a comment) - rust-lang#119096 (Yeet unnecessary param envs) - rust-lang#119118 (Fix arm64e-apple-ios target) - rust-lang#119134 (resolve: Feed visibilities for unresolved trait impl items) r? `@ghost` `@rustbot` modify labels: rollup

…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#118691 (Add check for possible CStr literals in pre-2021) - rust-lang#118973 (rustc_codegen_ssa: Don't drop `IncorrectCguReuseType` , make `rustc_expected_cgu_reuse` attr work) - rust-lang#119071 (-Znext-solver: adapt overflow rules to avoid breakage) - rust-lang#119089 (effects: fix a comment) - rust-lang#119094 (Add function ABI and type layout to StableMIR) - rust-lang#119102 (Add arm-none-eabi and armv7r-none-eabi platform-support documentation.) - rust-lang#119107 (subtype_predicate: remove unnecessary probe) Failed merges: - rust-lang#119135 (Fix crash due to `CrateItem::kind()` not handling constructors) - rust-lang#119141 (Add method to get instance instantiation arguments) r? `@ghost` `@rustbot` modify labels: rollup

Rollup merge of rust-lang#118691 - chfogelman:improve-cstr-error, r=fmease Add check for possible CStr literals in pre-2021 Fixes [rust-lang#118654](rust-lang#118654) Adds information to errors caused by possible CStr literals in pre-2021. The lexer separates `c"str"` into two tokens if the edition is less than 2021, which later causes an error when parsing. This error now has a more helpful message that directs them to information about editions. However, the user might also have written `c "str"` in a later edition, so to not confuse people who _are_ using a recent edition, I also added a note about whitespace. We could probably figure out exactly which scenario has been encountered by examining spans and editions, but I figured it would be better not to overcomplicate the creation of the error too much. This is my first code PR and I tried to follow existing conventions as much as possible, but I probably missed something, so let me know!

rustbot assigned b-naber Dec 7, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 7, 2023

fmease requested changes Dec 7, 2023

View reviewed changes

rustbot assigned fmease and unassigned b-naber Dec 14, 2023

fmease approved these changes Dec 19, 2023

View reviewed changes

fmease added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 19, 2023

Improve compiler error for c-strings in pre-2021

2c96025

chfogelman force-pushed the improve-cstr-error branch from c6729bd to 2c96025 Compare December 19, 2023 21:30

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 19, 2023

compiler-errors mentioned this pull request Dec 19, 2023

Rollup of 7 pull requests #119143

Closed

matthiaskrgr mentioned this pull request Dec 20, 2023

Rollup of 7 pull requests #119156

Merged

bors merged commit f3f9b30 into rust-lang:master Dec 20, 2023
11 checks passed

rustbot added this to the 1.76.0 milestone Dec 20, 2023

matthiaskrgr mentioned this pull request Mar 7, 2024

ICE failed while formatting fluent string parse_help_set_edition_standalone #122130

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add check for possible CStr literals in pre-2021 #118691

Add check for possible CStr literals in pre-2021 #118691

chfogelman commented Dec 7, 2023

rustbot commented Dec 7, 2023

fmease Dec 7, 2023

fmease Dec 7, 2023

fmease Dec 7, 2023 •

edited

Loading

fmease Dec 7, 2023 •

edited

Loading

chfogelman Dec 8, 2023

fmease Dec 7, 2023 •

edited

Loading

fmease Dec 7, 2023

fmease Dec 7, 2023

chfogelman commented Dec 8, 2023

fmease commented Dec 14, 2023

fmease left a comment •

edited

Loading

fmease Dec 19, 2023

fmease Dec 19, 2023

fmease Dec 19, 2023 •

edited

Loading

fmease Dec 19, 2023

fmease Dec 19, 2023

chfogelman commented Dec 19, 2023 •

edited

Loading

fmease commented Dec 19, 2023

bors commented Dec 19, 2023

	if self.prev_token.is_ident_named(Symbol::intern("c"))
	if self.prev_token.is_ident_named(sym::c)

	err.note("`CStr` literals require edition 2021");
	err.note("`c-string literals require at least edition 2021");

	// Extra info for `c"str"` before 2021 edition or `c "str"` in all editions. The heuristic
	// Try to detect an attempt by the user to write a c-string literal before the 2021 edition. The heuristic

	err.note("you may be trying to declare a c-string literal");
	err.note("you might have meant to write a c-string literal");

	err.note("c-string literals require edition 2021 or later");
	err.note("c-string literals require the 2021 edition or later");

	err.note("c-string literals require edition 2021 or later");
	err.note("c-string literals require Rust 2021 or later");

Add check for possible CStr literals in pre-2021 #118691

Add check for possible CStr literals in pre-2021 #118691

Conversation

chfogelman commented Dec 7, 2023

rustbot commented Dec 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fmease Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

fmease Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fmease Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chfogelman commented Dec 8, 2023

fmease commented Dec 14, 2023

fmease left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fmease Dec 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chfogelman commented Dec 19, 2023 • edited Loading

fmease commented Dec 19, 2023

bors commented Dec 19, 2023

fmease Dec 7, 2023 •

edited

Loading

fmease Dec 7, 2023 •

edited

Loading

fmease Dec 7, 2023 •

edited

Loading

fmease left a comment •

edited

Loading

fmease Dec 19, 2023 •

edited

Loading

chfogelman commented Dec 19, 2023 •

edited

Loading