Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: rebuild documentation for old crates #464

Closed
tspiteri opened this issue Nov 2, 2019 · 11 comments · Fixed by #2645
Closed

Feature request: rebuild documentation for old crates #464

tspiteri opened this issue Nov 2, 2019 · 11 comments · Fixed by #2645
Labels
A-builds Area: Building the documentation for a crate E-hard Effort: This will require a lot of work S-needs-design Status: There's a problem here, but no obvious solution; or the solution raises other questions

Comments

@tspiteri
Copy link
Contributor

tspiteri commented Nov 2, 2019

When crates are not updated in a long time, their style starts to drift from the latest style (or the latest style drifts for them) and consequently they do not gain the latest rustdoc features. I think it would be nice if crates which were last updated one or two years ago are rerun and the docs updated with the latest rustdoc.

This could be low priority so that it does not slow down documentation of new crates.

If a compilation fails for whatever reason, of course the old version should be retained.

@jyn514
Copy link
Member

jyn514 commented Nov 6, 2019

Hmm I don't mind this idea but it seems expensive to rerun the build on all versions that have ever been published ... This would just affect style sheets and such, right? What should we do if building with a newer version of rustdoc leads to broken links?

@jyn514
Copy link
Member

jyn514 commented Nov 8, 2019

Related: #301

@jyn514
Copy link
Member

jyn514 commented Dec 5, 2019

#295 also mentioned allowing crate owners to re-queue builds for their own crates, e.g. if the build succeeds but the JavaScript throws a runtime error because of a bad nightly. However, this does not seem feasible at least from the website because we currently allow arbitrary JavaScript from built crates. See #301 (comment)

@jyn514
Copy link
Member

jyn514 commented Aug 25, 2020

The reason I originally thought this wasn't feasible is because rustdoc puts out releases very frequently (either daily or every 6 weeks, depending how you count it). But nothing says we have to rebuild for every release - if we rebuilt, say every edition, that would still fix most of the rustdoc bugs while being feasible to finish all crates.

Another thing we could do is only rebuild the crates that are used the most: say the latest version and most recent release of every crate. That would bring the number of builds down from ~220k to between 40-80k builds.

This shouldn't be implemented until after #1004 to avoid very large re-upload costs (@pietroalbini estimates the current cost of reuploading every crate at $2k even if there are no changes in the documentation).

@Kixiron
Copy link
Member

Kixiron commented Aug 26, 2020

I think with good culling parameters we could significantly reduce the number of crates needed to be built & uploaded

  • Only the most recent release, pretty much required for feasibility
  • Download/Visit thresholds. If we set cutoffs on crates that haven't been touched in years, no one really loses anything
  • Don't rebuild the past few months of releases. The most recent builds will look pretty much the same, bumping a few nightlies shouldn't affect anything. It'll also let us avoid the "edition rush" of crates that are preparing for the new edition

I really want to propose some sort of documentation hashing/checksuming to attempt to find out if the docs currently uploaded differ from the ones we build and then don't delete & reupload if there's no change, but I doubt that's possible due to the unstable nature of rustdoc's output

@jyn514 jyn514 changed the title Feature request: update docs for old crates Feature request: rebuild documentation for old crates Aug 27, 2020
@jyn514
Copy link
Member

jyn514 commented Aug 27, 2020

(changing the title because I got tired of never being able to find it when searching 😆 )

@shepmaster
Copy link
Member

  • Download/Visit thresholds. If we set cutoffs on crates that haven't been touched in years, no one really loses anything

This was my thought. If a crate gets a lot of visits, then it's worth rebuilding the docs. In my particular instance, the futures crate docs appears to suffer from a bug where the <'a> was left off a trait method, which makes it look like maybe the lifetime comes from the trait instead.

You could bucket things up, as well. I don't know the actual limits in place, but something like:

Get the number of visits to a crate's docs between stable Rust release dates.

For the top X% of crates, regenerate them if they haven't been naturally generated in the last Y stable releases.

X Y
10 1
50 3
90 6

@jyn514
Copy link
Member

jyn514 commented Dec 16, 2020

Stable releases seems like an arbitrary metric, since crates are always built with nightly. We could use 6 weeks to sync with the release schedule, but I'm a little worried people will be confused if there's a new bug introduced on nightly or something.

In any case, I would prefer not to make major changes here until #1004 is fixed.

@ShadowJonathan
Copy link

One other thing that i'd like to suggest is rebuild requests, to be able to manually request somewhere to rebuild a (very) out of date documentation of a crate.

I just encountered this with anymap, and while i think a rebuild of old documentation on every major release, or after a amount of "rotting" time (lets say, longer than a year or two years), and then automatically rebuilding them (most popular crates first), while i think that would be a good approach, the best heuristic would be to be able to have users re-request a build. (and have the button visible and active only when a build is X amount of time old)

In a good-faith case, someone clicks this when they see the documentation style outdated, and they think it'd be "good" to have it be up-to-date.

In a bad faith case, someone goes around queuing this for every crate. I personally see this as alright, but it depends on how much the build system can handle, and if it's possible to queue this on a very low priority.

@jyn514
Copy link
Member

jyn514 commented Sep 9, 2021

@ShadowJonathan see also #301

rrbutani added a commit to rrbutani/rust-search-extension that referenced this issue Feb 11, 2022
Crates with hypens have their version extracted from the DOM (when viewing the latest version of a crate and adding it to the extension's index) incorrectly.

This in turn causes the extension to produce invalid docs.rs links.

----

[This snippet](huhu@4e84385#diff-dc9969d9ec58ceb09765359c0caa6852a087b462d98bb9a7e45f1ac75c79b066L12-R14) (which itself addressed [fallout](huhu@7483ba3#diff-dc9969d9ec58ceb09765359c0caa6852a087b462d98bb9a7e45f1ac75c79b066R12-R15) from `rustdoc` [changing its version output](rust-lang/rust@6a5f8b1#diff-40a0eb025da61717b3b765ceb7fab21d91af3012360e90b9f46e15a4047946faL1768-L1776)) is the problematic bit.

Updating the logic linked above to take the _last_ element after splitting on `-` instead of the second fixes this case but I think this leaves _other_ edge cases unhandled.

For example, `cargo` and friends allow for [pre-release versions which are allowed to have hypens](https://semver.org/#spec-item-9) (i.e. `0.0.1-my-extremely-unstable-release`). While it's unlikely that the docs.rs "latest" link for a crate will redirect to one of these, it is still possible – `docsrs` will [search stable, unyanked releases _first_ but *will* fall back to pre-releases](https://github.com/rust-lang/docs.rs/blob/dad5863093535004623df9e7d3789a11502313a5/src/web/mod.rs#L341-L368). The [`wasi` crate](https://docs.rs/wasi/latest/wasi/) is one such example of this (no "stable" releases as of this writing, pre-release version has a hypen in it: `0.11.0+wasi-snapshot-preview1`).

Reverting to the previous method (grabbing the version from the sidebar) and changing the query to `'nav.sidebar .version'` is general enough to support pages generated before and after the `rustdoc` version change without being _too_ general (and potentially picking up things in user-added HTML snippets) I think. This is the change I have implemented.

The downside to this approach is that it doesn't work on `rustdoc` output that predates the addition of the version in the sidebar; since docs.rs [doesn't rebuild docs for older releases](rust-lang/docs.rs#464) this can be a real concern for older stable crates that haven't had a release in a while.

---

Another approach is to snoop through some of the relative links on the page and to extract the version from the relative URLs there. There doesn't seem to be an obvious thing in the DOM to go after and we're definitely still susceptible to changes in `rustdoc` this way; I'm not sure if this is worth doing.

Yet another option is to pick an approach based on the `rustdoc` version in [`rustdoc-vars`](https://github.com/rust-lang/rust/blob/502d6aa47b4118fea1e326529e71b25a99b0d6c5/src/librustdoc/html/templates/page.html#L147) (i.e. `document.querySelector("#rustdoc-vars").getAttribute("data-rustdoc-version")`). This could help a little but it's worth noting that it itself is a relatively recent addition to the `rustdoc` HTML output, I think.
rrbutani added a commit to rrbutani/rust-search-extension that referenced this issue Feb 11, 2022
Crates with hyphens have their version extracted from the DOM (when
viewing the latest version of a crate and adding it to the extension's
index) incorrectly.

This in turn causes the extension to produce invalid docs.rs links.

----

[This snippet](huhu@4e84385#diff-dc9969d9ec58ceb09765359c0caa6852a087b462d98bb9a7e45f1ac75c79b066L12-R14)
(which itself addressed[fallout](huhu@7483ba3#diff-dc9969d9ec58ceb09765359c0caa6852a087b462d98bb9a7e45f1ac75c79b066R12-R15)
from `rustdoc` [changing its version output](rust-lang/rust@6a5f8b1#diff-40a0eb025da61717b3b765ceb7fab21d91af3012360e90b9f46e15a4047946faL1768-L1776))
is the problematic bit.

Updating the logic linked above to take the _last_ element after
splitting on `-` instead of the second fixes this case but I think this
leaves _other_ edge cases unhandled.

For example, `cargo` and friends allow for [pre-release versions which
are allowed to have hyphens](https://semver.org/#spec-item-9) (i.e.
`0.0.1-my-extremely-unstable-release`). While it's unlikely that the
docs.rs "latest" link for a crate will redirect to one of these, it is
still possible – `docsrs` will [search stable, unyanked releases _first_
but *will* fall back to pre-releases](https://github.com/rust-lang/docs.rs/blob/dad5863093535004623df9e7d3789a11502313a5/src/web/mod.rs#L341-L368).
The [`wasi` crate](https://docs.rs/wasi/latest/wasi/) is one such
example of this (no "stable" releases as of this writing, pre-release
version has a hypen in it: `0.11.0+wasi-snapshot-preview1`).

Reverting to the previous method (grabbing the version from the sidebar)
and changing the query to `'nav.sidebar .version'` is general enough to
support pages generated before and after the `rustdoc` version change
without being _too_ general (and potentially picking up things in
user-added HTML snippets). This is the change this commit implements.

The downside to this approach is that it doesn't work on `rustdoc`
output that predates the addition of the version in the sidebar; since
docs.rs [doesn't rebuild docs for older releases](rust-lang/docs.rs#464)
this can be a real concern for older stable crates that haven't had a
release in a while.

---

Another approach is to snoop through some of the relative links on the
page and to extract the version from the relative URLs there. There
doesn't seem to be an obvious thing in the DOM to go after and we're
definitely still susceptible to changes in `rustdoc` this way; I'm not
sure if this is worth doing.

Yet another option is to pick an approach based on the `rustdoc` version
in [`rustdoc-vars`](https://github.com/rust-lang/rust/blob/502d6aa47b4118fea1e326529e71b25a99b0d6c5/src/librustdoc/html/templates/page.html#L147)
(i.e. `document.querySelector("#rustdoc-vars").getAttribute("data-rustdoc-version")`).
This could help a little but it's worth noting that it itself is a
relatively recent addition to the `rustdoc` HTML output, I think.
rrbutani added a commit to rrbutani/rust-search-extension that referenced this issue Feb 14, 2022
Crates with hyphens have their version extracted from the DOM (when
viewing the latest version of a crate and adding it to the extension's
index) incorrectly.

This in turn causes the extension to produce invalid docs.rs links.

----

[This snippet](huhu@4e84385#diff-dc9969d9ec58ceb09765359c0caa6852a087b462d98bb9a7e45f1ac75c79b066L12-R14)
(which itself addressed[fallout](huhu@7483ba3#diff-dc9969d9ec58ceb09765359c0caa6852a087b462d98bb9a7e45f1ac75c79b066R12-R15)
from `rustdoc` [changing its version output](rust-lang/rust@6a5f8b1#diff-40a0eb025da61717b3b765ceb7fab21d91af3012360e90b9f46e15a4047946faL1768-L1776))
is the problematic bit.

Updating the logic linked above to take the _last_ element after
splitting on `-` instead of the second fixes this case but I think this
leaves _other_ edge cases unhandled.

For example, `cargo` and friends allow for [pre-release versions which
are allowed to have hyphens](https://semver.org/#spec-item-9) (i.e.
`0.0.1-my-extremely-unstable-release`). While it's unlikely that the
docs.rs "latest" link for a crate will redirect to one of these, it is
still possible – `docsrs` will [search stable, unyanked releases _first_
but *will* fall back to pre-releases](https://github.com/rust-lang/docs.rs/blob/dad5863093535004623df9e7d3789a11502313a5/src/web/mod.rs#L341-L368).
The [`wasi` crate](https://docs.rs/wasi/latest/wasi/) is one such
example of this (no "stable" releases as of this writing, pre-release
version has a hypen in it: `0.11.0+wasi-snapshot-preview1`).

Reverting to the previous method (grabbing the version from the sidebar)
and changing the query to `'nav.sidebar .version'` is general enough to
support pages generated before and after the `rustdoc` version change
without being _too_ general (and potentially picking up things in
user-added HTML snippets). This is the change this commit implements.

The downside to this approach is that it doesn't work on `rustdoc`
output that predates the addition of the version in the sidebar; since
docs.rs [doesn't rebuild docs for older releases](rust-lang/docs.rs#464)
this can be a real concern for older stable crates that haven't had a
release in a while.

---

Another approach is to snoop through some of the relative links on the
page and to extract the version from the relative URLs there. There
doesn't seem to be an obvious thing in the DOM to go after and we're
definitely still susceptible to changes in `rustdoc` this way; I'm not
sure if this is worth doing.

Yet another option is to pick an approach based on the `rustdoc` version
in [`rustdoc-vars`](https://github.com/rust-lang/rust/blob/502d6aa47b4118fea1e326529e71b25a99b0d6c5/src/librustdoc/html/templates/page.html#L147)
(i.e. `document.querySelector("#rustdoc-vars").getAttribute("data-rustdoc-version")`).
This could help a little but it's worth noting that it itself is a
relatively recent addition to the `rustdoc` HTML output, I think.
@ShadowJonathan
Copy link

@GuillaumeGomez @syphar thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-builds Area: Building the documentation for a crate E-hard Effort: This will require a lot of work S-needs-design Status: There's a problem here, but no obvious solution; or the solution raises other questions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants