[EXPERIMENT] Disallow all literal suffixes except the standard numeric ones #103872

nnethercote · 2022-11-02T07:05:13Z

Partly out of curiosity, and partly because this would significantly simplify parts of the lexer and parser.

r? @ghost

…c ones. Partly out of curiosity, and partly because this would significantly simplify parts of the lexer and parser.

nnethercote · 2022-11-02T07:06:19Z

@bors try

bors · 2022-11-02T07:06:29Z

⌛ Trying commit 1d0b161 with merge 8330aa5198c56eb493d06cd9c4cc91d00d70e3b6...

nnethercote · 2022-11-02T07:07:08Z

The code is very rough, but should be good enough for the experiment.

bors · 2022-11-02T09:12:13Z

☀️ Try build successful - checks-actions
Build commit: 8330aa5198c56eb493d06cd9c4cc91d00d70e3b6 (8330aa5198c56eb493d06cd9c4cc91d00d70e3b6)

nnethercote · 2022-11-02T09:15:10Z

@craterbot run mode=check-only

craterbot · 2022-11-02T09:15:16Z

👌 Experiment pr-103872 created and queued.
🤖 Automatically detected try build 8330aa5198c56eb493d06cd9c4cc91d00d70e3b6
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

craterbot · 2022-11-02T09:15:18Z

🚧 Experiment pr-103872 is now running

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

craterbot · 2022-11-03T07:10:50Z

🎉 Experiment pr-103872 is completed!
📊 253 regressed and 11 fixed (247083 total)
📰 Open the full report.

⚠️ If you notice any spurious failure please add them to the blacklist!
ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

nnethercote · 2022-11-03T08:54:49Z

Plenty of imaginative suffix use in macro-based DSLs. So disallowing them is a non-starter.

nnethercote · 2022-11-05T11:56:12Z

To summarize the crater results: dozens of crates use literal suffixes in all sorts of interesting ways. Here are some examples from many (but not all) of the crates that were broken by the crater run disallowing arbitrary suffixes:

https://github.com/ThatsNoMoon/evalvana: r"[^a-z0-9\-_]"i // see regex! from laxy-regex crate
https://github.com/TheKK/gemuboi: 8bits, 16bits
https://github.com/daniel5151/analog_literals: 1D, 2D_TOP, 2D_MID, 2D_BOTTOM, 3D_TOP_l_h, 3D_TOP_l_BOTTOM_l, 3D_MID_w, 3D_BOTTOM_h
https://github.com/farmaazon/sixtyfps-bugs: 20px, 100px
https://github.com/fenhl/lore-seeker-discord: :2B:624281333740339210, :2G:624281333769961514, :2R:624281334948298752, :2U:624281333350268929, :2W:624281333824356379
https://github.com/josh65536/tsurust: <stop offset="0%" stop-color=("#"{color.x;02x}{color.y;02x}{color.z;02x})/>
https://github.com/kriogenia/bresenham_zip: 3D:X, 3D:Y, 3D:Z, 2D:X, 2D:Y
https://github.com/lonelyhentai/rusty-leetcode: 1e-64f
https://github.com/maekoos/aar: 10t, 10x, 11n, 21s, 21h, 31i, 51l, 21c
https://github.com/qiuchengxuan/ascii-osd-hud: -1.0i8, 0.0i8
https://github.com/qiuchengxuan/bmp280: 16bit, 20bit
https://github.com/qiuchengxuan/max7456: 20ns, 100ns
https://github.com/recmo/uint: 2_U512, 0x80000000000000000000000000000000_U512
https://github.com/rpadaki/bio: 1s, 100px, #4444aa, #00bb88
https://github.com/stavenko/gerb-view-spa: 800px, 1fr
https://github.com/tjwilson90/goat: 88777S, 42H
https://github.com/vrmiguel/negate: compile_error!("Expected names (in name-value pairs) are either "docs" or "name", but a different name was found.") // oh no
https://github.com/woodgear/simple-replace-templete-engine: "hello _t_name_t_ _t_sec_name_t_"_ // Make underscore_literal_suffix a hard error. #103914 will disallow all _ suffixes
anything 0.1.3: btu^2J^2
1ws-nitro-enclaves-attestation-ffi 0.1.0: 614967200ULL
bitcoin-script 0.1.2: 12g34
bmp280-core 0.2.0: 16bit, 0.0050dC
cocogitto 5.2.0: 35B66CC21AEBFC9B0E8C89F1FD753A01E06E05D7
cyfs-base-derive: 0u256
dashu-float 0.2.1: 0x81p-6, -0x817p-10, 0x915b1p-18, 0x1p4
fomat-macros 0.3.1: fomat!({=13:05b} ".")
freebsd-kpi-13-1 0.1.4: 1U, 0xffff000000000000L
guid-parser 0.1.0: 0cf00d
hexlit 0.5.5: 0A, 0B, 0C, 0d, 0A_0B_0C
hobo_css 0.3.0: easy_enum! {transform-style flat preserve-3d}
if_rust_version 1.0.0: 1010u543
ifmt 0.3.3: .3s, 11.3S, 420;#06x,
mpu6000 0.3.0: +/-16g, 2048LSB/g, +/-2000dps
onenote_parser 0.3.0: guid!({1A5A319C-C26B-41AA-B9C5-9BD8C44E07D4})
seq-macro 0.3.1: 0X09..0X10 // ?!
smallnum 0.4.1: size_of()::<128TypeNeg>() // ?!
starship 1.11.0: 2018-01-01T00:00:00Z
test-with 0.8.0: 999GB
typ 0.1.1: 0u, 1u, 2u

Use cases:

Lots of custom units
Some coordinates
Some custom numeric suffixes, like _U256
Emulating syntax of others languages/formats, like C, HTML, CSS, UUIDs, timestamps
A few bizarre cases that possibly aren't doing what the author thinks they're doing

Correctly handling some of these on the macro side must be quite the task.

Note that arbitrary literal suffixes go back to #19103 RFC 463, where the idea was introduced to "futureproof" Rust for the possibility of fancier suffixes, e.g. for different kinds of literals. One of the unresolved questions in that RFC was:

Should it be the parser or the tokenizer rejecting invalid suffixes? This is effectively asking if it is legal for syntax extensions to be passed the raw literals? That is, can a foo procedural syntax extension accept and handle literals like foo!(1u2)?

The answer chosen by the implementation was "the parser", which allowed arbitrary suffixes to be used as macro inputs. This arguably broke the futureproofing. Can new suffixes can be reasonably added, given that existing macros can (and do) effectively define their own? I'm honestly not sure.

Integers with arbitrary suffixes are allowed as inputs to proc macros. A number of real-world crates use this capability in interesting ways, as seen in rust-lang#103872. For example: - Suffixes representing units, such as `8bits`, `100px`, `20ns`, `30GB` - CSS hex colours such as `#7CFC00` (LawnGreen) - UUIDs, e.g. `785ada2c-f2d0-11fd-3839-b3104db0cb68` The hex cases may be surprising. - `#7CFC00` is tokenized as a `#` followed by a `7` integer with a `CFC00` suffix. - `785ada2c` is tokenized as a `785` integer with an `ada2c` suffix. - `f2d0` is tokenized as an identifier. - `3839` is tokenized as an integer literal. A proc macro will immediately stringify such tokens and reparse them itself, and so won't care that the token types vary. All suffixes must be consumed by the proc macro, of course; the only suffixes allowed after macro expansion are the numeric ones like `u8`, `i32`, and `f64`. Currently there is an annoying inconsistency in how integer literal suffixes are handled, which is that no suffix starting with `e` is allowed, because that it interpreted as a float literal with an exponent. For example: - Units: `1eV` and `1em` - CSS colours: `#90EE90` (LightGreen) - UUIDs: `785ada2c-f2d0-11ed-3839-b3104db0cb68` In each case, a sequence of digits followed by an 'e' or 'E' followed by a letter results in an "expected at least one digit in exponent" error. This is an annoying inconsistency in general, and a problem in practice. It's likely that some users haven't realized this inconsistency because they've gotten lucky and never used a token with an 'e' that causes problems. Other users *have* noticed; it's causing problems when embedding DSLs into proc macros, as seen in rust-lang#111615, where the CSS colours case is causing problems for two different UI frameworks (Slint and Makepad). We can do better. This commit changes the lexer so that, when it hits a possible exponent, it looks ahead and only produces an exponent if a valid one is present. Otherwise, it produces a non-exponent form, which may be a single token (e.g. `1eV`) or multiple tokens (e.g. `1e+a`). Consequences of this: - All the proc macro problem cases mentioned above are fixed. - The "expected at least one digit in exponent" error is no longer possible. A few tests that only worked in the presence of that error have been removed. - The lexer requires unbounded lookahead due to the presence of '_' chars in exponents. E.g. to distinguish `1e+_______3` (a float literal with exponent) from `1e+_______a` (previously invalid, but now the tokenised as `1e`, `+`, `_______a`). This is a backwards compatible language change: all existing valid programs will be treated in the same way, and some previously invalid programs will become valid. The tokens chapter of the language reference (https://doc.rust-lang.org/reference/tokens.html) will need changing to account for this. In particular, the "Reserved forms similar to number literals" section will need updating, and grammar rules involving the SUFFIX_NO_E nonterminal will need adjusting. Fixes rust-lang#111615.

[EXPERIMENT] Disallow all literal suffixes except the standard numeri…

1d0b161

…c ones. Partly out of curiosity, and partly because this would significantly simplify parts of the lexer and parser.

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 2, 2022

nnethercote mentioned this pull request Nov 2, 2022

Use token::Lit in ast::ExprKind::Lit. #102944

Merged

craterbot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 2, 2022

craterbot added the S-waiting-on-crater Status: Waiting on a crater run to be completed. label Nov 2, 2022

craterbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Nov 3, 2022

nnethercote closed this Nov 3, 2022

nnethercote mentioned this pull request May 16, 2023

Allow integer suffixes starting with e. #111628

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EXPERIMENT] Disallow all literal suffixes except the standard numeric ones #103872

[EXPERIMENT] Disallow all literal suffixes except the standard numeric ones #103872

nnethercote commented Nov 2, 2022

nnethercote commented Nov 2, 2022

bors commented Nov 2, 2022

nnethercote commented Nov 2, 2022

bors commented Nov 2, 2022

nnethercote commented Nov 2, 2022

craterbot commented Nov 2, 2022

craterbot commented Nov 2, 2022

craterbot commented Nov 3, 2022

nnethercote commented Nov 3, 2022

nnethercote commented Nov 5, 2022 •

edited

Loading

[EXPERIMENT] Disallow all literal suffixes except the standard numeric ones #103872

[EXPERIMENT] Disallow all literal suffixes except the standard numeric ones #103872

Conversation

nnethercote commented Nov 2, 2022

nnethercote commented Nov 2, 2022

bors commented Nov 2, 2022

nnethercote commented Nov 2, 2022

bors commented Nov 2, 2022

nnethercote commented Nov 2, 2022

craterbot commented Nov 2, 2022

craterbot commented Nov 2, 2022

craterbot commented Nov 3, 2022

nnethercote commented Nov 3, 2022

nnethercote commented Nov 5, 2022 • edited Loading

nnethercote commented Nov 5, 2022 •

edited

Loading