-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustdoc URL conflict resolution #3097
rustdoc URL conflict resolution #3097
Conversation
text/0000-rustdoc-url-conflicts.md
Outdated
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
It'll make the URL harder to read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this break almost every link, like https://doc.rust-lang.org/std/string/struct.String.html
? Seems like that should be listed as a drawback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is my primary concern here. We'd be breaking a lot of URLs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add it.
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
The idea here is to replace every capitalized letter in the item with `-[minimized capital]`. So for example, `enum.FooBar.html` will become `enum.-foo-bar.html`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems really unfortunate that this would make even incredibly common things like struct.Vec.html
have URLs that are materially worse (in my opinion).
Could it at least use a slightly more complicated scheme to take advantage of the usual rust naming conventions? Like have struct vec
be the one that gets struct.-vec.html
, since it's normal for struct
s to start with an upper-case letter, and let struct Vec
continue to be struct.Vec.html
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I don't think it's good that common types get affected by this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That wouldn't help with cases where the case differs mid-identifier and could be valid either way struct FooBar; struct FoObar;
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand your point @Nemo157. It would be -fo-obar
. It would prevent name conflict, which is the primary issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to @scottmcm's alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my bad. Then yes, I completely agree with you. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nemo157 Yes, that's not a complete fix, as it'd still change struct.FooBar.html
to struct.Foo-Bar.html
or similar.
But I don't think we should break non-agglutinative names just because we have to break agglutinative ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I guess my explanations are really bad... All capital letters are replaced by a dash followed by the lowercase version. So FooBar
will become struct.-foo-bar.html
. With this, no more conflicts.
text/0000-rustdoc-url-conflicts.md
Outdated
# Motivation | ||
[motivation]: #motivation | ||
|
||
This is one of the oldest rustdoc issue and has been problematic for years and a lot of users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide more details about those places?
My first instinct here is that having two structs in the same module that differ only by case is something that people should never be doing, and thus this should never come up. TBH I'd be inclined to make a deny-by-default lint for it, and say that if you do it and the docs don't work I don't really care. But maybe there's an important situation I'm missing, so having that called out here could easily change my mind about it.
The places where things differ in only case that don't bother me are things like fn.default.html
and trait.Default.html
, where the item-kind disambiguator is already making the URL unique.
(Then we could special-case keyword.*.html
to make it work. Or just make one doc for both of them.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this seems like a major change for very very little benefit in a case people should not be doing at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But as soon as we enter into the "binding zone", it's just conflicts everywhere. What to do then? Rename items? That could work, but that doesn't sound like a good idea... This is one of the oldest rustdoc issue for a good reason after all. I think it's a really good change because it keeps a recognizable URL scheme while preventing name conflicts. But well, debates are what RFCs are for after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, special case things is rarely a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GuillaumeGomez what do you mean by the binding zone?
I do not think using dashes randomly is a recognizeable URL scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not random (If the explanation in the RFC isn't good, please tell meso I can update it!).
It replaces all capitalized letters by adding a dash before them and then putting the non-capital letter behind. So hELoPeople
becomes h-e-lo-people
.
For the "binding zone", I'll take a const example:
/usr/include/openssl/crypto.h
49:# define SSLeay_version OpenSSL_version
51:# define SSLEAY_VERSION OPENSSL_VERSION
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, by "randomly" I meant "not in a way that is intuitively predictable"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, special case things is rarely a good idea.
While I agree in general, I see no issue with special-casing keyword
, since it's already a magic special case anyway. It's not like users can add keyword yolo;
items in their crates that we need to document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They actually can. :p
The idea behind doc(keyword)
is to allow proc-macro crates to add documentation for their own keywords if they introduce new ones.
Perhaps a reasonable alternative might be to not allow multiple pages on rustdoc to differ by name exclusively by case (perhaps via some sort of compatibility lint) and then offer some attribute that lets you change the name that's used in the URL to circumvent this. Also as a lint, we can also turn it off when this level of compatibility isn't needed, but keep it on for stdlib. For example, we could do something like |
@clarfonthey There is no way I see where we can do that. :-/ |
What I care about (as
One alternative could be:
If we instead go for a renaming approach such as that described in the original RFC here by Guillaume then
However we go about this we MUST solve this as it's causing pain now that we generate |
rustdoc could generate mangled version only in the unlikely event a conflict actually occurs for this name. When that happens, it could generate a "disambiguation" page for the un-mangled name that would have links to the different options. (similar to wikipedia's disambiguation page) and that does not require JS. (As I mentioned https://internals.rust-lang.org/t/pre-rfc-stable-rustdoc-urls/13099/12 ) So for example, with |
@ogoffart I explained in the RFC why this approach doesn't work, I even linked my PR which took this approach. It simply doesn't work. Another important point: it requires JS to work, or we support non-JS browsing and I'd prefer if we kept it this way. |
That PR doesn't take the approach i describe at all, as it still seem to keep the URL the same in case of conflict. The idea would be that rustdoc would not generate link to One would, in fact, not need And this |
It seems I misunderstood your comment, my bad. But then there is no coherence and no way to know which |
We're hoping to do this as a short-term workaround for the standard library itself (rust-lang/rust#83678) so there's less urgency to get some form of RFC in quickly. However this would be a pretty big burden for crates outside the standard library. It would need user action to decide what the new name should be for every item that has a conflict. For autogenerated crates like windows-rs or stm32 that would be non-trivial to fix. It also expands I do think it would be nice in the meantime to add an allow-by-default lint on names that overlap, but I don't think the long-term plan should be to have users fix it manually. |
Another use case where this would be hard to fix: |
If anyone wants to look into more detail on situations in which existing crates are hitting issues with this overlap I previously compiled a list of all overlapping filenames in docs.rs (using data from around March 2020 IIRC), this has ~700k overlapping pairs, out of a total of ~308M files, so ~0.45% affected. |
I don't think it's a good idea either. It's not up to the users to go around rustdoc limitations. Just like you said, renaming conflicted items could be a huge burden for them. Lints are for best practices, and I'm pretty sure it's not something we want rustdoc users to do. ;) |
That's a good point. Perhaps the |
That remains a breaking change in the URL scheme. Another issue with that is if a conflict appears afterward because you add a new item, it means the links to the original item won't work anymore. Last element (for the rustdoc internals): that requires rustdoc to know in advance for the URL conflicts to be able to generate the correct URL for intra-doc links. |
I think we can special case this one, and we can do that without an RFC. It's already being patched up in rust-lang/rust#83678 Here's a proposal: For outside the stdlib, we can have rustdoc detect situations with clashes, arbitrarily pick a numbering based on lexicographic sorting (struct.foo.html, struct.Foo-1.html, struct.FOO-2.html), and show a warning that it did this. We can also provide This is great: we do not break unbroken docs, but broken docs get "fixed" in a way that is suboptimal but allows installation, and provides a path to be fixed. Users can then make their own choices on whether they wish to continue having these clashes -- which lead to unstable URLs if anything new is added -- or fix with When fixing broken stuff it is good to as much as possible limit the impact to the broken stuff itself. The current proposal has multiple downsides:
It introduces a new encoding scheme, and that is not to be taken lightly. Browser URL bars for example do not know anything about "rustdoc's bespoke case encoding scheme" and will not apply any normalization to autocomplete. My proposal here has this downside:
This is still better than breaking all the docs (where everyone gets unstable links), and it's better than the status quo of "you have no docs for one of these" for crates with clashes. I am very much strongly against anything that changes the URL scheme for existing crates that do not have this problem, especially something which introduces a new encoding scheme. |
# Rationale and alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
The other alternatives require either JS to be enabled all the time or just don't work on both case-sensitive and case-insensitive file systems with the same generation process, forcing us to differentiate between both and make documentation generation case sensitivity-related. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please actually list these other alternatives and why they require JS to be enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll describe them a bit more and add yours.
You really didn't explain in the RFC at all; it's really hard to follow the background here. Just a couple links to PRs is not enough. It is entirely possible to do this without JS. Just generate two files, and link to them directly as needed. The disambiguation page need not be fancy JS, it can literally just list the two options. I don't understand why JS is necessary at all, and I've gone through that PR. |
[drawbacks]: #drawbacks | ||
|
||
* It'll make the URL harder to read. | ||
* It'll change the existing URLs (this is mostly an issue for blogs and external content not using rustdoc or old rust documentation not using intra-doc links) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a huge huge issue and I'm surprised we're taking this so lightly. I link to rustdoc pages in my writing at least once a week and usually once a day, there are millions of links out there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be linked using rust version: https://doc.rust-lang.org/1.48.0/std/index.html
Fixing a 6 years problem seems more important to me. The important thing will be to communicate about this a lot.
This is not a solution, this is a hack. And a horrible one at that to fix the current rustup issue until we reach a conclusion.
But by doing this, we introduce incoherence because an item URL can changed because of some unrelated change in the code.
I don't see how "hard to type from memory" can be true here. It requires to know how it works, but that's it. Then it's just back to how it was. Also, to be fair, it doesn't remove the ability to urlcomplete, it simply makes it look less good. But the advantage is that the URL scheme will then be permanent.
Which is why we're having this RFC.
It can also change URL if an another item name enter in conflict with another one, introducing incoherence.
This is your opinion, I don't agree with you on this but again, this is why we have RFC. However, it's not "everyone gets unstable links", it's a new URL scheme, which unlike the one you proposed won't have incoherences.
That is the biggest downside of this RFC, indeed. I think it's worth it.
You mean containing both items in the disambiguation page? That would make it double the size and would require some way from the browser to know which one to display. Otherwise, I don't see how you can do it. Anoter problem then will be on case-sensitive FS: you need to make all URLs go to the disambiguation page, and for that you need link files ( If we choose to simply not handle the original URLs and directly go to both correct files, it can work, but then we go back to the incoherence I strongly disagree with. |
I mean, sure, whatever, it fixes rustup's immediate issues. I'm trying to remove the sense of urgency from this RFC.
These users have already broken docs, this is already true for them. My proposal does not automatically fix the instability, but we lint about it, provide a sensible default, and provide these users a path to not having this instability.
No, I mean the disambiguation page would just contain two links. That's it. You don't need to do anything fancy to load docs onto the page. There's absolutely no need to do that in a disambiguation page; it's a disambiguation page its purpose is to disambiguate, not to show you everything.
We should simply not do this. We do not need to make all URLs go to the disambiguation page, just one of them. We don't even have to make a disambiguation page in the first place, the solution still works. The whole point of my proposal is that we explicitly declare such clashes to be things we do not support well, and provide users with a route to avoiding such clashes via You seem to have latched on to a particular solution of this kind, noticed that it's infeasible, and declared that that is why such solutions do not work. Please don't; please actually engage with this solution as stated instead of assuming it's the same as previous proposals.
We break half the links out there. This is not perpetually unstable, but it's unstable in that we break a ton of stuff. That's instability.
I do not think people will learn how this works |
I see. yes, there is no urgency. The goal here is to fix the URL issue definitely.
So we would expect from rustdoc users to go around rustdoc limitations. This is not a great user experience...
Oh I see. But that wouldn't work either like I explained on top. All conflict URLs have to be covered, which wouldn't be the case on case sensitive FS.
Except it doesn't work.
The rustdoc URL scheme isn't stable yet. We're not proposing this out of nowhere. We tried multiple times to fix this issue and nothing worked because we tried to change as little as possible the URL scheme. Unfortunately it's a problem.
That is a fair argument. |
This is a limitation imposed on us by the filesystem, either we work around it (as we're doing so in this RFC), or we give the users some choice in the matter
Because we lint about it we don't have to be perfect about how it's handled, just do a sensible default. |
No, it's a limitation we created by picking this URL scheme. The FS should have been taken into account before that.
But then the issue isn't solved, you just hacked something on top of it which didn't solve the original issue. |
…nflict, r=jyn514 Fix Self keyword doc URL conflict on case insensitive file systems (until definitely fixed on rustdoc) This is just a hack to allow rustup to work on macOS and windows again to distribute std documentation (hopefully once rust-lang/rfcs#3097 or an equivalent is merged). Fixes rust-lang#80504. Prevents rust-lang#83154 and rust-lang/rustup#2694 in future releases. cc `@kinnison` r? `@jyn514`
…nflict, r=jyn514 Fix Self keyword doc URL conflict on case insensitive file systems (until definitely fixed on rustdoc) This is just a hack to allow rustup to work on macOS and windows again to distribute std documentation (hopefully once rust-lang/rfcs#3097 or an equivalent is merged). Fixes rust-lang#80504. Prevents rust-lang#83154 and rust-lang/rustup#2694 in future releases. cc ``@kinnison`` r? ``@jyn514``
Another solution based on a disambiguator was suggested. To make things more smooth, this would only be applied on case insensitive file systems. I wrote a comparison between the two propositions there if anyone is interested: https://gist.github.com/GuillaumeGomez/a9d5e0d92987599a7966f5363543f81f |
Closing this RFC as for now, another solution would be preferred rather than change all URLs. |
…nflict, r=jyn514 Fix Self keyword doc URL conflict on case insensitive file systems (until definitely fixed on rustdoc) This is just a hack to allow rustup to work on macOS and windows again to distribute std documentation (hopefully once rust-lang/rfcs#3097 or an equivalent is merged). Fixes rust-lang#80504. Prevents rust-lang#83154 and rust-lang/rustup#2694 in future releases. cc ``@kinnison`` r? ``@jyn514``
cc @rust-lang/rustdoc
Rendered