Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macros 2.0: Negative numbers as a single token #48889

Closed
alexcrichton opened this issue Mar 9, 2018 · 4 comments · Fixed by #49545
Closed

Macros 2.0: Negative numbers as a single token #48889

alexcrichton opened this issue Mar 9, 2018 · 4 comments · Fixed by #49545
Labels
A-macros-2.0 Area: Declarative macros 2.0 (#39412)

Comments

@alexcrichton
Copy link
Member

@ngg ran into some weird bugs and made an awesome reduction which boils down to:

#[proc_macro]
pub fn buggy_macro(_: TokenStream) -> TokenStream {
    TokenNode::Literal(Literal::i32(-1)).into()
}

It turns out that when expanding this rustc will print out:

error: int literal is too large

Originating from here I think that this is actually a misdiagnosed error in the sense that the parse is failing because the string is "-1" which isn't parseable as a u128, but the current code assumes that it can only fail due to overflow. Generally this is correct because the parser never feeds minus signs to this function!

So the broader issue here I believe is that Literal::i32 can't actually exist. Our parser, as-is today, (and tokenizer?) dictates that the source string -1 is lexed as two tokens, the minus and then the 1. This means that Literal::i32 pretending it can only be one token isn't actually right!

Should we remove Literal::i32 and all other signed functions? If so should we also panic on negative floats? If not, should we refactor the AST to support literal tokens with negative integers?

cc @jseyfried, @dtolnay, @nrc

@alexcrichton alexcrichton added the A-macros-2.0 Area: Declarative macros 2.0 (#39412) label Mar 9, 2018
alexcrichton added a commit to alexcrichton/quote that referenced this issue Mar 9, 2018
Looks like upstream in rust-lang/rust negative integers are represented as two
tokens instead of one token (and it looks like proc_macro may erroneously (?)
accept negative integers as literals, see rust-lang/rust#48889). As a result
tweak the `ToTokens` impls for signed integers to maybe put a `-` token out in
front. Similar treatment is applied to f32/f64 as well.

Special treatment is required, however, for the `iNN::min_value()` constants.
The actual integral portion isn't actually representable as a positive integer
literal (as it'd overflow back to negative) so to handle this case everything is
just represented as a u64 literal cast to the right type.
@alexcrichton
Copy link
Member Author

I opened dtolnay/quote#65 as a workaround for the quote crate for now, but it also made me realize that removing functions like Literal::i32 isn't actually possible but we may still want to change them! Specifically i32::min_value() makes this pretty tricky. So today we have Literal::i32(arg: i32), but it turns out we've actually got no method to do represent i32::min_value() as a literal via the proc_macro API!

Given that information I think I'd propose the following change:

  • Change Literal::iNN from taking iNN to taking uNN. For example Literal::i32 would take a u32 argument. This argument would not be validate to fit within the range of valid i32 integer values. It's stored internally as a stringified u128 effectively.
  • Change Literal::{f32, f64} to panic if the input is negative (like today it panics if it's NaN or infinite).

I think with that change we should be able to put all integers into literals (with suffixes) and we'd also be able to be clear that the negative tokens don't go into the literal but rather need to get emitted as a separate token.

alexcrichton added a commit to alexcrichton/quote that referenced this issue Mar 9, 2018
Looks like upstream in rust-lang/rust negative integers are represented as two
tokens instead of one token (and it looks like proc_macro may erroneously (?)
accept negative integers as literals, see rust-lang/rust#48889). As a result
tweak the `ToTokens` impls for signed integers to maybe put a `-` token out in
front. Similar treatment is applied to f32/f64 as well.

Special treatment is required, however, for the `iNN::min_value()` constants.
The actual integral portion isn't actually representable as a positive integer
literal (as it'd overflow back to negative) so to handle this case everything is
just represented as a u64 literal cast to the right type.
alexcrichton added a commit to alexcrichton/rust that referenced this issue Mar 9, 2018
This commit tweaks the public api of the `proc_macro::Literal` structure to
solve rust-lang#48889, an issue about negative integers and their representation in
`proc_macro` tokens. Currently the compiler assumes that all integer tokens are
*positive* integers rather than embedding a negative sign. The negative sign is
a separate token from the integer itself.

Currently there's a function like:

    impl Literal {
        pub fn i32(i: i32) -> Literal;
    }

but unfortunately this doesn't work for negative integers. When called with
negative integers weird errors end up getting thrown later during the parsing
process.

This function tweaks the definitions to instead use:

    impl Literal {
        pub fn i32(i: u32) -> Literal;
    }

This construction makes it more clear that negative arguments are not supported.
This additionally allows literals like `-128i8` where `128`, the positive
integer part of this literal, isn't representable in `i8`. By taking an unsigned
number in each constructor we can allow constructing the minimum literal for
each signed type as well.

The final tweak of this commit is to panic in `Literal::{f32, f64}` if the input
number is negative. This is similar to how infinite/nan literals are handled
today (they panic) and constructing a literal should be possible by negating a
float and then separately passing in a literal.

Closes rust-lang#48889
@nrc
Copy link
Member

nrc commented Mar 11, 2018

iirc, we do quite a few 'tricks' with tokens, that is the way we lex source code is not necessarily how the parser treats the tokens. The interpolated tokens from macros are an obvious example which can't be written, only produced from code. I think << is handled in some tricky way too, and negative numbers might be included in this category (that is, maybe they should be if they are not currently).

This is a long-way of saying that perhaps the TokenStream as seen by a proc macro does not need to match exactly what the lexer outputs and that the Rust parser should be able to cope with Tokens created by proc macros which are not able to be created elsewhere.

@dtolnay
Copy link
Member

dtolnay commented Mar 12, 2018

👍 for what I think Nick is saying which is Literal::i32(-1) should be a perfectly legitimate single token.

@ngg
Copy link
Contributor

ngg commented Mar 12, 2018

+1 Even if quote! can apply some workarounds, it would be easier for macros that want to manipulate token streams in some other ways.

@dtolnay dtolnay changed the title Macros 2.0: Negative integers aren't actually one token Macros 2.0: Negative numbers as a single token Mar 28, 2018
bors added a commit that referenced this issue Apr 1, 2018
proc_macro: Tweak doc comments and negative literals

This commit tweaks the tokenization of a doc comment to use `#[doc = "..."]`
like `macro_rules!` does (instead of treating it as a `Literal` token).
Additionally it fixes treatment of negative literals in the compiler, for
exapmle `Literal::i32(-1)`. The current fix is a bit of a hack around the
current compiler implementation, providing a fix at the proc-macro layer rather
than the libsyntax layer.

Closes #48889
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros-2.0 Area: Declarative macros 2.0 (#39412)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants