Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import sorting does not handle uppercase/lowercase consistently #5269

Closed
Tracked by #4991
joshtriplett opened this issue Mar 17, 2022 · 19 comments
Closed
Tracked by #4991

Import sorting does not handle uppercase/lowercase consistently #5269

joshtriplett opened this issue Mar 17, 2022 · 19 comments

Comments

@joshtriplett
Copy link
Member

This may be related to #4648 , but I don't know if it's the same issue.

rustfmt sorts imports, and braced groups within imports, in the following order:

use a;
use aA;
use aa;
use Aa;
use WoRD;
use Word;
use A;
use AA;
use WORD;

use brace_group_sort_test::{a, aA, aa, Aa, WoRD, Word, A, AA, WORD};

This isn't alphabetical, ASCIIbetical, or any other sort order that I can identify.

The style guide says to sort ascii-betically here, in both cases (top-level and within braced lists).

@ytmimi
Copy link
Contributor

ytmimi commented Mar 17, 2022

Thanks for the report!

Confirming that I'm able to reproduce this on rustfmt 1.4.38-nightly (b4de150d 2022-03-12). I'll try to set aside some time later to look a little deeper into this.

@joshtriplett
Copy link
Member Author

@ytmimi Thank you!

@ytmimi
Copy link
Contributor

ytmimi commented Mar 17, 2022

I did some digging and I think the formatting you're seeing is caused by the implementation of Ord for UseSegment. As highlighted by the comment (line 776) we're trying to preserve the order based on snake_case < CamelCase < UPPER_SNAKE_CASE, before taking into account the ASCIIbetical order.

rustfmt/src/imports.rs

Lines 761 to 780 in 1bb85bd

impl Ord for UseSegment {
fn cmp(&self, other: &UseSegment) -> Ordering {
use self::UseSegment::*;
fn is_upper_snake_case(s: &str) -> bool {
s.chars()
.all(|c| c.is_uppercase() || c == '_' || c.is_numeric())
}
match (self, other) {
(&Slf(ref a), &Slf(ref b))
| (&Super(ref a), &Super(ref b))
| (&Crate(ref a), &Crate(ref b)) => a.cmp(b),
(&Glob, &Glob) => Ordering::Equal,
(&Ident(ref ia, ref aa), &Ident(ref ib, ref ab)) => {
// snake_case < CamelCase < UPPER_SNAKE_CASE
if ia.starts_with(char::is_uppercase) && ib.starts_with(char::is_lowercase) {
return Ordering::Greater;
}
if ia.starts_with(char::is_lowercase) && ib.starts_with(char::is_uppercase) {

@ytmimi
Copy link
Contributor

ytmimi commented Mar 17, 2022

It also looks like we're not doing an explicit "CamelCase" check, and that might explain why aA comes before aa. I think when comparing that case we're following the ascii order

@calebcartwright
Copy link
Member

Acknowledge seeing this and share the initial reaction of puzzlement. Obviously any changes here, bug fix or otherwise, can have a wide blast radius of formatting changes, so we'll need to be extremely thorough and thoughtful.

The corresponding code hasn't been touched in a long time, so I'd also suggest going back through the original style guide issue to see if perhaps there were any discussion of relevant behavior that perhaps never made its way into the final codified result.

rust-lang/style-team#24

Though somewhat tangential, #5075 and rust-lang/style-team#143 may also be of interest if the door is being reopened

@digama0
Copy link

digama0 commented Mar 18, 2022

It looks like #1785 was the PR that introduced this change.

@joshtriplett
Copy link
Member Author

joshtriplett commented Mar 18, 2022

@calebcartwright I do understand that rustfmt has to be cautious about introducing changes that will cause churn. If rustfmt were sorting in a plausible order that just disagreed with the style guide, I would absolutely suggest we keep that until an edition boundary or similar, to avoid churn. However, in this case, it seems like this is producing an unpredictable order that people can't easily replicate, and I think this may be close enough to a bugfix to be worth considering.

(That's separate from any future changes to do version-sorting or similar, which I'd love to see but which does seem like 2.0 material.)

This won't arise in every import list, and in the ones where it does arise, I think the inconsistency and inability for a human to produce the same order as rustfmt before running rustfmt provide a sufficiently compelling motivation for this change.

That said, I also acknowledge that any such change will incur a cost, even if we think that cost is worthwhile. And whether or not this can be fixed in rustfmt 1, I'd still like to see it fixed in the next generation.

@calebcartwright
Copy link
Member

@joshtriplett - to be clear, I have not said nor am I implying that we do nothing. It deserves consideration and careful inspection, both to determine/confirm there's divergence from prescribed behavior and if so then when/how a fix/change would be rolled out.

I am stressing the need for caution and thoroughness, however, as there's been more than a few occasions where something seemed like an obvious bug at first blush but which later turned out to not be the case.

I realize that if there were anyone that would know/remember the specifics it would be folks like yourself and Nick, but on our end we do still need to go back through relevant discussions to confirm, especially given the age and thoroughness of those threads.

@calebcartwright
Copy link
Member

I'm marking this as a p-high, as I'd really like to get imports_granularity stabilized, with at least a few of the variants stabilized (IIRC one or two may not be ready due to bugs/relative nascency).

@yaahc - this could potentially be a pretty gnarly one, but also potentially a decent one if you wanted to hack on the rustfmt codebase a little.

Subsequent to some of my prior comments, @ytmimi has developed some really sweet automation that we can use to evaluate changes for idempotency across multiple codebases (kinda like our own mini formatting crater run) that helps reduce the review burden of having to mentally conjure potential formatting impacts

@tgross35
Copy link
Contributor

tgross35 commented Aug 23, 2023

Is this issue as simple as redoing the section @ytmimi linked to not check for cases and instead do ascii sorting? Or are there further decisions to make here, with impact being the biggest issue? If impact is an issue, could sorting just become style edition dependent? I am assuming that would be fairly easy too if edition is stored in an atomic somewhere.

Btw @joshtriplett your style guide link in the original post is dead

@ytmimi
Copy link
Contributor

ytmimi commented Aug 23, 2023

@tgross35 UseSegment was updated a while ago to store the version.

rustfmt/src/imports.rs

Lines 104 to 108 in 16db2a4

#[derive(Clone, Eq, PartialEq)]
pub(crate) struct UseSegment {
pub(crate) kind: UseSegmentKind,
pub(crate) version: Version,
}

That should make it easier to change the Ord impl for Version::Two and get a version gated fix out for this.

@trevyn
Copy link
Contributor

trevyn commented Nov 30, 2023

TLDR: The ALL_CAPS vs CamelCase vs snake_case sorting came about in order to separate constants from types in list imports (e.g. use self::libc::{access, c_char, c_int, W_OK};). Is it incorrectly getting applied to groups of imports separated by whitespace? Also it appears this sort method got general consensus, but didn't end up in the style guide.

Btw @joshtriplett your style guide link in the original post is dead

Updated style guide link: https://github.com/rust-lang/rust/blob/master/src/doc/style-guide/src/items.md#ordering-of-imports

Timeline of rust-lang/style-team#24 as it pertains:

Dec 16, 2016:

  • ssokolow expresses desire for "all-caps things like simple integer constants" in list imports to be at the end of the sort order, e.g. use self::libc::{access, c_char, c_int, W_OK};, support is expressed for the general idea of constants appearing together.

Mar 28, 2017:

Mar 29, 2017:

  • nrc makes first mention of grouping by ALL_CAPS vs CamelCase vs snake_case, which gets general support.

May 9, 2017:

  • nrc summarizes consensus as "Within each import, we sort in groups: snake_case, CamelCase, SCREAMING_SNAKE_CASE, within each group, sort alphabetically"

July 12, 2017:

  • topecongiro opens PR for implementation in rustfmt; merged two days later.

July 13, 2017:

Sort names imported from a module into the following groups in order, if present; within groups, sort alphabetically:

* `self`

* `CamelCase` names (e.g. types)

* `snake_case` names (e.g. functions, modules)

* `ALL_CAPS` (e.g. constants)

Apr 14, 2018:

  • nrc summarizes thread, but without mention of camel/snake case. This appears to be the source of what ended up in the style guide.

@kniteli
Copy link

kniteli commented Mar 12, 2024

Using casing conventions as a stand-in for types seems like the main flaw here. To achieve consistency you need one of the following

  1. Ignore casing and use the actual type to achieve the sorting groups mentioned
  2. Another formatting rule that enforces the casing conventions by type as defined above, which is just point 1 achieved in a roundabout way.
  3. Separate rules for sorting groups and lexicographic sort method within groups. <-- probably should have been the approach in the first place, as conflating the two causes even more difficult to deal with problems if you ever want to have a hope of moving beyond ASCII.

Point 3 then makes a demand for an extra rule (whether that rule is exposed or not), which determines the sorting order of the groups themselves. So I believe the correct approach requires three definitions, which may or may not be exposed as configurable, but must be defined nonetheless:

  1. A definition for sorting groups, whether those are grouped by type as discussed (and how the types are grouped), or by some other parameter.
  2. A defined sorting order for the groups. ie types come before functions which come before constants which come before etc etc.
  3. Finally, lexicographic sorting within each group, which would be dependent on the locale. This can use the asciibetical rule as currently prescribed, but it makes it at least possible to offer unicode collation should the Rust Style Guide ever decide to take a more internationally aware approach.

@ytmimi
Copy link
Contributor

ytmimi commented Mar 12, 2024

@kniteli Unfortunately there's no way to know what the type of a value is when it's being imported. The type of each identifier isn't represented in the AST that rustfmt uses to format your program. rustfmt has no idea whether you're importing a struct, enum, trait, function, constant, macro, etc.

@kniteli
Copy link

kniteli commented Mar 12, 2024

I assumed that might be the case. I'd imagine rustfmt is probably designed to work on single files with no cross-project context necessary?

Yes, I wasn't necessarily trying to prescribe a course of action, I was more trying to block out the boundaries that any solution has to adhere to, which leads us to: fundamentally I don't think there's any way to for this issue to be closed without performing the intended grouping of language constructs with the proper tools rather than using a casing convention. There's just... not enough info for it to solve correctly, and there never will be based on characters and casing.

It's either drop the the groups, somehow get access to the language types and group that way, or live with the current reality of "usually acceptable but still incorrect" sorting.

I think long term the only solutions are dropping the groups or getting access to the language constructs. First class unicode support will invariably come to rust sooner or later, and that's just incompatible with this current approach.

I suppose another approach would be to just make the groups entirely configurable by some user defined patterns, the default of which is the current setup, but that sounds complicated and error prone, both for the end user and rustfmt development. It does deflect the unicode issue though.

@kniteli
Copy link

kniteli commented Mar 12, 2024

There is another possibility, just to be exhaustive. I mentioned this approach earlier:

Another formatting rule that enforces the casing conventions by type as defined above, which is just point 1 achieved in a roundabout way.

I assume you have the construct types on the source file side (unlike during an import), so this approach should be achievable. It would solve the immediate problem, but doesn't solve for unicode (if that's important to you), and it makes a coherent sort dependent on another rule being active.

@ytmimi
Copy link
Contributor

ytmimi commented Mar 12, 2024

@kniteli sorry, I'm not sure I understand what you mean. Any solution that involves name resolution isn't going to work. At the time rustfmt is ready to rewrite / sort an import it can only reliably rely on the information present in the ast::UseTree node.

Trying to infer the type of an identifier based on source code might seem doable, but it isn't a very robust solution. Among other issues, it could only be applied for locally defined types and then we'd end up with inconsistencies around sorting imports from external dependencies.

@kniteli
Copy link

kniteli commented Mar 12, 2024

Trying to infer the type of an identifier based on source code might seem doable, but it isn't a very robust solution. Among other issues, it could only be applied for locally defined types and then we'd end up with inconsistencies around sorting imports from external dependencies.

Right, makes sense. So we come back around to the main problem outlined: you can't robustly group using convention, especially because the imports might be from a library that uses a different convention, so even the idea I mentioned about user configurable strategies doesn't work.

Any solution that involves name resolution isn't going to work.

That's the only solution (beside dropping the groups, or ignoring the issue), that's what I'm trying to highlight. You're not going to find the necessary extra information to do this "group and sort" robustly without it. So if you're saying that it's impossible, then the only two choices left are 1. stop grouping or 2. ignore the issue. Make that decision and then #4991 has a path to stabilization. I suspect I'm simply retreading ground that's already been discussed out-of-band, but no decision has been made because it's an unsatisfying answer.

@calebcartwright
Copy link
Member

I'm going to close because from the rustfmt team perspective this is now resolved with #6284 completed

The original topic here was a report about compatibility with the sorting algorithm prescribed in the Style Guide (specifically, Editions 2015-2021 of the Style Guide). That can't be modified within the context of those older Style editions due to rustfmt's stability guarantee, so there's nothing to be done there.

The Style Team provided new sorting prescriptions in the 2024 Style edition (which I'll note is intended to also cover unicode), and that will be the rustfmt behavior for 2024 onwards.

As such, everything related to the original report is resolved and there's no more action to be taken. Some of the subsequent discussion broached into how sorting could effect other options, but I am still going to close this issue as those discussions should happen on their respective issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants