-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve print_tts
#114571
Improve print_tts
#114571
Conversation
This comment has been minimized.
This comment has been minimized.
@petrochenkov: you suggested to me the idea of using jointness during pretty-printing, this is a draft attempt.
|
de764f4
to
bbdb7cc
Compare
I am working on a fix for the #76399 issue. It's going well and I will put it up soon. In the meantime, don't look at the old code. |
bbdb7cc
to
fb04eb0
Compare
New code is up and ready for review.
These questions are still relevant.
I have fixed this.
I have done this now, in the final commit.
I have done this too. The new type is called |
fb04eb0
to
0a92053
Compare
I posted a doc update for stable public-facing |
I don't think The "jointness" is an inherent property attached to a single token (tree) rather than to a sequence of token trees. I think the matklad's scheme in #76285 was fine - when parsing a token stream we just need to track whether a token is immediately followed by something or not, and then "censor" that information at proc macro boundary to match the public facing behavior documented in #114617. I'd ideally first land a version of #76285 first, without any pretty-printing changes or renamings, but with regressions fixed. |
As for pretty-printing streams from proc macros, I think we can extend the internal spacing to |
You haven't answered the following questions yet.
It would be helpful if you did.
What is the right model? You've made two suggestions:
This is a PR mostly about pretty-printing of parsed code, but your suggestions don't relate to that.
I don't understand how
Arbitrarily modified how? Token streams can be modified in proc macros but
It caused a regression.
That's exactly what this PR does. The three-value
The problem in #76285 was the addition of code in My original code had a similar check, causing the same problem. I fixed it by moving the
What about pretty-printing streams from parsed code? The whole point of this PR is to improve that case. #76285 didn't affect that case. Your original suggestion on Zulip was "it would be preferable to preserve jointness for all tokens and use it instead during pretty-printing." That is what this PR does. You said that
|
I'll return to this in a few days. In the meantime a question - how does this PR deal with the |
The same way the current code does.
No. The |
No :D |
|
There's a possibility, at least. It's better to run the changes through crater in any case, because even the |
There are a few logical changes here that I would prefer to review and land separately.
|
The current |
The issue in #76399 is that the token stream constructed purely through proc macro interface (
UPD: Ah, I see, #114571 (comment) talks about the regression reason, but I don't like the way in which it's fixed in this PR. |
🎉 Experiment
|
I looked at the most recent full run. There are two new errors, each repeated a few times. I think neither is significant. vgtkVarious errors like this, coming from the
openbook-v2Various errors like this, coming from
This is from ConclusionBased on this and the previous analysis, I think this is ok to be merged. The only follow-up would be to modify @petrochenkov: what do you think? |
let this_spacing = if next_tok.is_punct() { | ||
Spacing::Joint | ||
} else if next_tok.kind == token::Eof { | ||
Spacing::Alone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this exception fixes cases like this?
// EOF immediately after string, so spacing is joint-hidden
"string"
The string token is then put into a token stream before an identifier.
"string"suffix
After pretty-printing and re-lexing the lexical structure is broken, "string"suffix
is interpreted as 1 token instead of 2.
Spacing::Alone
for EOF doesn't address the root issue here.
"string" can still get joint spacing in cases like "string"+
and be printed as "string"suffix
later.
This case needs a special case in pretty-printer.
Then special treatment for EOF in this file won't be necessary.
That's not an urgent issue though, it can be addressed separately later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one crate for which this was a problem is described in this comment above:
DarumaDocker.reactor-wasm
Uses wasmedge_bindgen macro from wasmedge-bindgen-macro-0.1.13.
Weird one... this:
let gen = quote! {
#[no_mangle]
pub unsafe extern "C" fn #func_ident(params_pointer: *mut u32, params_count: i32) {
...
}
somehow becomes this:
#[no_mangle] pub unsafe extern "C"fn run(params_pointer : * mut u32, params_count : i32) {
The lack of space after "C" is a legitimate problem. Hmm.
Somehow, Eof is involved. If we change things so that tokens followed by
Eof have Alone spacing instead of JointHidden spacing, that fixes the
problem. I don't understand why.
So it is a string suffix issue like you suggested. A follow-up sounds good, it only affected one crate in all the crater runs.
r=me after the style fixes. |
`tokenstream::Spacing` appears on all `TokenTree::Token` instances, both punct and non-punct. Its current usage: - `Joint` means "can join with the next token *and* that token is a punct". - `Alone` means "cannot join with the next token *or* can join with the next token but that token is not a punct". The fact that `Alone` is used for two different cases is awkward. This commit augments `tokenstream::Spacing` with a new variant `JointHidden`, resulting in: - `Joint` means "can join with the next token *and* that token is a punct". - `JointHidden` means "can join with the next token *and* that token is a not a punct". - `Alone` means "cannot join with the next token". This *drastically* improves the output of `print_tts`. For example, this: ``` stringify!(let a: Vec<u32> = vec![];) ``` currently produces this string: ``` let a : Vec < u32 > = vec! [] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![] ; ``` (The space after the `]` is because `TokenTree::Delimited` currently doesn't have spacing information. The subsequent commit fixes this.) The new `print_tts` doesn't replicate original code perfectly. E.g. multiple space characters will be condensed into a single space character. But it's much improved. `print_tts` still produces the old, uglier output for code produced by proc macros. Because we have to translate the generated code from `proc_macro::Spacing` to the more expressive `token::Spacing`, which results in too much `proc_macro::Along` usage and no `proc_macro::JointHidden` usage. So `space_between` still exists and is used by `print_tts` in conjunction with the `Spacing` field. This change will also help with the removal of `Token::Interpolated`. Currently interpolated tokens are pretty-printed nicely via AST pretty printing. `Token::Interpolated` removal will mean they get printed with `print_tts`. Without this change, that would result in much uglier output for code produced by decl macro expansions. With this change, AST pretty printing and `print_tts` produce similar results. The commit also tweaks the comments on `proc_macro::Spacing`. In particular, it refers to "compound tokens" rather than "multi-char operators" because lifetimes aren't operators.
This is an extension of the previous commit. It means the output of something like this: ``` stringify!(let a: Vec<u32> = vec![];) ``` goes from this: ``` let a: Vec<u32> = vec![] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![]; ```
Because the spacing-based pretty-printing partially preserves that.
6826e47
to
940c885
Compare
I fixed the style nits. @bors r=petrochenkov |
☀️ Test successful - checks-actions |
Finished benchmarking commit (a9cb8ee): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 668.008s -> 669.249s (0.19%) |
Pkgsrc changes: * Adapt checksums and patches. Upstream chnages: Version 1.76.0 (2024-02-08) ========================== Language -------- - [Document Rust ABI compatibility between various types] (rust-lang/rust#115476) - [Also: guarantee that char and u32 are ABI-compatible] (rust-lang/rust#118032) - [Warn against ambiguous wide pointer comparisons] (rust-lang/rust#117758) Compiler -------- - [Lint pinned `#[must_use]` pointers (in particular, `Box<T>` where `T` is `#[must_use]`) in `unused_must_use`.] (rust-lang/rust#118054) - [Soundness fix: fix computing the offset of an unsized field in a packed struct] (rust-lang/rust#118540) - [Soundness fix: fix dynamic size/align computation logic for packed types with dyn Trait tail] (rust-lang/rust#118538) - [Add `$message_type` field to distinguish json diagnostic outputs] (rust-lang/rust#115691) - [Enable Rust to use the EHCont security feature of Windows] (rust-lang/rust#118013) - [Add tier 3 {x86_64,i686}-win7-windows-msvc targets] (rust-lang/rust#118150) - [Add tier 3 aarch64-apple-watchos target] (rust-lang/rust#119074) - [Add tier 3 arm64e-apple-ios & arm64e-apple-darwin targets] (rust-lang/rust#115526) Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. Libraries --------- - [Add a column number to `dbg!()`] (rust-lang/rust#114962) - [Add `std::hash::{DefaultHasher, RandomState}` exports] (rust-lang/rust#115694) - [Fix rounding issue with exponents in fmt] (rust-lang/rust#116301) - [Add T: ?Sized to `RwLockReadGuard` and `RwLockWriteGuard`'s Debug impls.] (rust-lang/rust#117138) - [Windows: Allow `File::create` to work on hidden files] (rust-lang/rust#116438) Stabilized APIs --------------- - [`Arc::unwrap_or_clone`] (https://doc.rust-lang.org/stable/std/sync/struct.Arc.html#method.unwrap_or_clone) - [`Rc::unwrap_or_clone`] (https://doc.rust-lang.org/stable/std/rc/struct.Rc.html#method.unwrap_or_clone) - [`Result::inspect`] (https://doc.rust-lang.org/stable/std/result/enum.Result.html#method.inspect) - [`Result::inspect_err`] (https://doc.rust-lang.org/stable/std/result/enum.Result.html#method.inspect_err) - [`Option::inspect`] (https://doc.rust-lang.org/stable/std/option/enum.Option.html#method.inspect) - [`type_name_of_val`] (https://doc.rust-lang.org/stable/std/any/fn.type_name_of_val.html) - [`std::hash::{DefaultHasher, RandomState}`] (https://doc.rust-lang.org/stable/std/hash/index.html#structs) These were previously available only through `std::collections::hash_map`. - [`ptr::{from_ref, from_mut}`] (https://doc.rust-lang.org/stable/std/ptr/fn.from_ref.html) - [`ptr::addr_eq`](https://doc.rust-lang.org/stable/std/ptr/fn.addr_eq.html) Cargo ----- See [Cargo release notes] (https://github.com/rust-lang/cargo/blob/master/CHANGELOG.md#cargo-176-2024-02-08). Rustdoc ------- - [Don't merge cfg and doc(cfg) attributes for re-exports] (rust-lang/rust#113091) - [rustdoc: allow resizing the sidebar / hiding the top bar] (rust-lang/rust#115660) - [rustdoc-search: add support for traits and associated types] (rust-lang/rust#116085) - [rustdoc: Add highlighting for comments in items declaration] (rust-lang/rust#117869) Compatibility Notes ------------------- - [Add allow-by-default lint for unit bindings] (rust-lang/rust#112380) This is expected to be upgraded to a warning by default in a future Rust release. Some macros emit bindings with type `()` with user-provided spans, which means that this lint will warn for user code. - [Remove x86_64-sun-solaris target.] (rust-lang/rust#118091) - [Remove asmjs-unknown-emscripten target] (rust-lang/rust#117338) - [Report errors in jobserver inherited through environment variables] (rust-lang/rust#113730) This [may warn](rust-lang/rust#120515) on benign problems too. - [Update the minimum external LLVM to 16.] (rust-lang/rust#117947) - [Improve `print_tts`](rust-lang/rust#114571) This change can break some naive manual parsing of token trees in proc macro code which expect a particular structure after `.to_string()`, rather than just arbitrary Rust code. - [Make `IMPLIED_BOUNDS_ENTAILMENT` into a hard error from a lint] (rust-lang/rust#117984) - [Vec's allocation behavior was changed when collecting some iterators] (rust-lang/rust#110353) Allocation behavior is currently not specified, nevertheless changes can be surprising. See [`impl FromIterator for Vec`] (https://doc.rust-lang.org/nightly/std/vec/struct.Vec.html#impl-FromIterator%3CT%3E-for-Vec%3CT%3E) for more details. - [Properly reject `default` on free const items] (rust-lang/rust#117818)
By slightly changing the meaning of
tokenstream::Spacing
we can greatly improve the output ofprint_tts
.r? @ghost