-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate crate categories and allow metadata on keywords #3488
base: master
Are you sure you want to change the base?
Conversation
imo there is value in having both closed and open vocabularies for finding content. I'd hate to deal with proliferation of terms to find cargo plugins. |
@epage I feel like this example is actually one in favour of removing the existing closed vocabularies. For example, right now, there's one category for cargo plugins, which contains over 350 crates. This gives some indication that instead of cargo being a single category, it should be multiple. But with the existing vocabulary, this isn't an option, since we can't retroactively remove categories. The other option is to make categories have arbitrary nesting, which can be quite messy. So, instead, crates can choose to add to the existing cargo plugins subtag ( Sure, we could just spend hours fiddling with the categories as examples like this show up. But the point is that they will always show up, and I think that it'd be more ideal to decouple this fiddling from the actual technical standard, rather than it being an inherent part of it. |
While the existing |
Right. The point here is that this is suggesting that alternative. Changing vocabulary is something that cannot and should not happen by decree. It has to happen organically in stages. Namely:
Since a crate can have multiple categories, this is something that's entirely doable without losing the old categorization. The difference here is that by making the canonicity of a category an emergent property (modified on the side of crates.io), and not one explicitly labelled (in
Or this bad option:
This feels obvious to me, based upon the text of the RFC. Is it not? Have you read it? |
This is not suggesting what I feel is needed: guard rails to help in working towards de facto standardized tags. The problem is with discoverability for crate authors. The RFC mentions noteworthy tags which helps with some of the problem but not all. We're asking crate authors to do SEO and giving them a limited tool that is managed by a small team of people (crate authors with sway and crates.io) and are putting it outside of their workflow. I think itd be good to do an analysis of how we are doing on categories and keywords and find ways to improve those as a testing bed for if we can handle exclusively open vocabulary. |
The current state of categories and keywords in crates is poor, but OTOH I don't think the root cause is in what they're called, or in which toml field they're specified. When you have a mix of blessed and arbitrary tags, you still have categories and keywords, only merged into a single namespace, and you lose control over the category namespace. The current categories are an odd bunch, with some large gaps, and easily confused siblings.
Few crates actually select categories, so browsing by category on crates.io gives an incomplete picture. However, keywords are also a mess! They're also pretty sparse. Dealing with synonyms and normalization is laborious. There's I've tried using StackOverflow's tag data, but they actually have very little data for tag synonyms, and their perspective is weirdly specific to their site, focused on SEO, .Net and webdev product names (webform -> winforms, jws -> java-web-start, ctrl-c -> copy-paste, span -> html). They maintain descriptions/wikis for their tags, so they can afford having ambiguous tag names. Making good categories based on keywords is surprisingly hard. I thought it'd be easy to have everything categorized on lib.rs, but it's been an enormous time sink and endless whack-a-mole. "geo" is for geometry or geography, sometimes both. A crate tagged So overall I think this change is just rearranging fields, and improvement of categorization needs other work, that doesn't really require tags.
|
I don't know how relevant it is, but the tag curation effort that immediately came to mind whilst reading this is the AO3 tag wranglers, as they seem to have a "moderation team" style group dedicated to the curation of aliases and hierarchy. I realise now that there's probably something more equivalent in the Wikipedia categorisation moderation space, which might be more applicable due to the public nature of Wikipedia governance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of suggestions but overall I'm very much in favor of this RFC.
I think combining a list of blessed (aka. "noteworthy") keywords with a system that allows users to create/use new keywords without interaction and discussion with the crates.io team will be very beneficial for everyone.
Having both concepts in the same package ecosystem has always confused me, and I'm very happy that we finally have a proposal to get rid of the confusion! :)
text/0000-cargo-tags.md
Outdated
* It's now possible to add a single double-colon inside a tag to indicate a parent tag. This has the effect of adding two tags to a crate: for example, adding the `development-tools::testing` tag adds the crate to both the `development-tools` tag and the `development-tools::testing` tag. The part after the double-colon is called a subtag. | ||
* On each side of the double-colon, the length of text may be up to 25 characters. That means that, including the double-colon, tags can be up to 52 characters long. (This is to accomodate the accomodate the largest category before the unification, `development-tools::procedural-macro-helpers`, although rounding up to a nice number for the actual limit.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that we necessary need the colon support in tags. If we put that kind of meaning to it, it would probably have to be supported on the crates.io backend and I see that as potentially challenging if we allow users to invent new tags.
I also don't really see the need for it since #development-tools::testing
could be replaced by #development-tools #testing
and most likely still work equally well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like nesting is helpful to know what is an intentional refinement of a tag when browsing blessed tags like browsing the development-tools category
text/0000-cargo-tags.md
Outdated
|
||
crates.io can now add metadata to tags according to this policy. Immediately after the adoption of this RFC, this would likely start by converting the `categories.toml` configuration into a `tags.toml` configuration, with PRs affecting this file subject to the policy. | ||
|
||
Tags with the same name as a crate may be automatically marked as "crate tags" if the majority of the crates with the tag have the tagged crate as a non-development dependency, optional or otherwise. At the discretion of the crates.io team, tags may be explicitly marked as non-crate tags, to account for cases of popular crates with generic names. This can help avoid marking a particular crate as "canonical" just because it's popular and has a generic name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I completely understand this part. Could you add an example for this?
Categories can be removed and renamed. crates.io could ignore existence of slugs it doesn't want, and create aliases for category slugs it wants to rename. On lib.rs I've completely deleted |
text/0000-cargo-tags.md
Outdated
|
||
Tags with the same name as a crate may be automatically marked as "crate tags" if the majority of the crates with the tag have the tagged crate as a non-development dependency, optional or otherwise. At the discretion of the crates.io team, tags may be explicitly marked as non-crate tags, to account for cases of popular crates with generic names. This can help avoid marking a particular crate as "canonical" just because it's popular and has a generic name. | ||
|
||
Tags that are deemed "noteworthy" can have metadata added to them. A tag is noteworthy if it: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crates.io choosing which tags are "noteworthy" doesn't seem all that different from crates.io choosing which categories can be created.
To me this seems to be mainly a matter of creating a policy and a process, whether you call that categories or noteworthy tags.
New categories could be picked based on popularity of keywords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the main issue with that model is that if new categories are picked based upon keywords, users now have to upload new versions of their crates that use those categories. If keywords themselves can be treated as categories, those changes are applied retroactively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only so far as they picked the "winners". With all of the different cases, etc, I still seeing all but a select few influential packages as having to make a change and do a new release.
Co-authored-by: teor <[email protected]>
doc/reference/manifest: Adjust `keywords` description This adjusts the naming rules for keywords to match the implemented reality: https://github.com/rust-lang/crates.io/blob/aab95692baa0dd2374a2ab5cb2cb2d89d7b2a2eb/src/models/keyword.rs#L56-L64 see also: - rust-lang/rfcs#3488 (comment) - rust-lang/rfcs#3488 (comment)
we talked through this RFC in the crates.io team meeting today. I'll try to summarize:
@rust-lang/crates-io I hope I summarized this correctly. If not, please let me know :) |
Thank you for the detailed feedback! I'll try to address these over the weekend and remove the new "tags" in favour of just keeping keywords. In terms of migrating categories over, my thought process was that categories would be implicitly treated as keywords, but we can work out the details on that after the revisions. |
Okay, the big revision is done, and tags are no more, replaced with keywords. Please let me know if anything seems off; I may have overzealously search-replaced "tags" with "keywords" in ways that might not make sense. Subtags are now "commonly paired" keywords, which allow the crates.io team to create some kind of keyword structure without there actually having to be one. There are probably still details to flesh out, but that's what the RFC process is for. |
|
||
Recently, [a discussion](https://github.com/rust-lang/crates.io/discussions/6762) was opened in the crates.io repository on whether the cryptocurrencies category should be removed from crates.io, due to the plethora of issues surrounding them. This should not be treated as a reason for adopting the RFC (although it was a motivation to write it), but instead as something that brought up the fact that categories cannot be removed by policy. | ||
|
||
Because categories cannot be removed, they also cannot be renamed or otherwise curated by the community. However, by switching to keywords, we can effectively solve this problem; community members can simply start publishing their crates under different keywords and older, unsupported crates wouldn't have any issue remaining published under their older versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Categories can be removed and renamed. crates.io could ignore existence of slugs it doesn't want, and create aliases for category slugs it wants to rename. On lib.rs I've completely deleted external-ffi-bindings, and renamed math to science::math.
(moved here to avoid it getting lost in the noise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, although crates.io still ultimately has to accept categories that were valid in the past as valid in the future, meaning they're not really removed. Like, I don't think it's very good UX to just passively ignore a category when uploading to crates.io, since the user added that for a reason and we shouldn't just get rid of it. And if we actually stopped builds because of it, that'd be breaking backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More so my point is that this motivation doesn't seem valid. The RFC doesn't address the alternative that @kornelski mentions: of having category merges, renames, and removals through a normalization process.
Like, I don't think it's very good UX to just passively ignore a category when uploading to crates.io, since the user added that for a reason and we shouldn't just get rid of it
That doesn't mean it can't be done but might be a downside to this alternative. That is something for the approving team to weigh in on with seeing your RFC and not for you to evaluate and leave out. Personally, I would be in favor of such an approach though it isn't my call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with @kornelski and @epage. While the crates.io team has never deleted a category so far, it is technically possible with some effort to do so.
Because categories cannot be removed, they also cannot be renamed or otherwise curated by the community. However, by switching to keywords, we can effectively solve this problem; community members can simply start publishing their crates under different keywords and older, unsupported crates wouldn't have any issue remaining published under their older versions. |
Probably easiest to just remove this part :)
Edit: looking at the surrounding diff that suggestion might not be the best..... 😅
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
Categories in `Cargo.toml` are now deprecated. Setting `categories` will now trigger a warning and suggest to use `keywords` instead. To ensure that crates still build correctly in these cases, the provided `categories` are implicitly converted into `keywords`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure that crates still build correctly in these cases, the provided
categories
are implicitly converted intokeywords
.
This makes it sound like by adding a deprecation warning, crates using categories will no longer be able to build, so cargo must move categories to keywords.
However, a warning should not affect building. I can see crates.io doing the conversion on their side, since they have to deal with a variety of cargo versions. I can also see us removing support for package.categories
in an edition with cargo fix
automatically migration people. Those are both a little different from how I read this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so, I definitely over-simplified in the guide-level explanation, and will try and take a look at it later to see how I can reword it. It should hopefully be clear from the reference explanation.
I don't really think we should ever remove the categories from the manifest, since the edition system would force us to parse it anyway, and it's not like we really gain anything by forcing crates past a certain edition to not use it. We could make the theoretical lint that triggers deny-by-default, but we also don't need an edition bump to do that either.
I do think that the potential to cargo fix
is on the table, though, assuming we go through with this.
Ultimately, we don't want to break old builds, so, the best we can do is implicitly convert to keywords and tell people to not use categories. Breaking builds would be breaking backwards compatibility, after all, and that's not allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its fine for the guide to simplify things; its just the guide made it sound like this was the only place explaining a part of what is happening.
I don't really think we should ever remove the categories from the manifest, since the edition system would force us to parse it anyway, and it's not like we really gain anything by forcing crates past a certain edition to not use it. We could make the theoretical lint that triggers deny-by-default, but we also don't need an edition bump to do that either.
If we are completely changing the interpretation of categories
and deprecating it (warning + maybe cargo fix), then I think it is worth considering migrating people to avoid confusion ("I have the categories field, where does it show up?", "I'm looking at categories, but what does it mean?").
Ultimately, we don't want to break old builds, so, the best we can do is implicitly convert to keywords and tell people to not use categories. Breaking builds would be breaking backwards compatibility, after all, and that's not allowed.
Except what is build-breaking that you are avoiding? That was mentioned in the RFC and in the reply but there isn't any explanation as to what you are referring to. Are you talking saying we can't remove the field altogether? Or something else? What made this more confusing is that the concern for "build breaking" is tied to the registries implicitly converting.
# Summary | ||
[summary]: #summary | ||
|
||
Categories for crates are now deprecated and implicitly added as keywords instead. A new set of policies is added to allow the crates.io team to curate the way keywords are presented, replacing features such as the "Popular Categories" list on crates.io with a "Popular Keywords" instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should split this RFC up into two RFCs... 🫣
The current one is conflating 1) crates.io no longer showing any categories on the webpage and ignoring them for published crates, and 2) deprecating the categories
field in Cargo.toml
for all registries.
The first one is more specific to crates.io, while the second has larger consequences also for third-party registries and cargo.
While there might be some overlap I'm wondering if the first part would be easier to get consensus on, and then the second part may need more discussion, also with e.g. some third party registry maintainers.
Disclaimer: I'm not saying that we absolutely have to split the RFC up, but I'm wondering what others think about that. It might also resolve the "what team is responsible for this RFC" question, with crates.io being responsible for the first part and the cargo team for the second part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, if we split it, it would be for procedural reasons but that they should be approved hand-in-hand.
I feel that approving one without ahead of the other applies an innate pressure on the other team to approve something.
Using an extreme to illustrate my point: Say the cargo team put up and approved a RFC for removing categories from manifests all by itself. In a way that would compel the crates.io team to do something.
1. Instead of being gatekept by a PR to the crates.io repository, categories can organically be adopted by community members in the form of keywords. Which keywords are most popular and useful can be decided organically. | ||
2. Keywords can still be given descriptions and other metadata on crates.io, although no distinction between these "special" keywords and other keywords is made in cargo itself. This allows making changes to the way crates are presented without having to worry about backwards compatibility. | ||
3. Adding, removing, and modifying the curated set of keywords is no longer a technical choice, but a cultural one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to include some of the points I mentioned in #3488 (comment) as motivations too
Co-authored-by: Clar Fon <[email protected]>
Co-authored-by: Clar Fon <[email protected]>
Co-authored-by: Clar Fon <[email protected]>
Expand "Merge categories and keywords" RFC
Thanks to @Turbo87 we now have a completely revised version of this RFC which adds historical context and some more analysis. It's still probably going to be changed before the final version, but hopefully things are a lot clearer now! |
to the one the user is currently looking at can help with this though. | ||
|
||
- **Clarity:** In a flat namespace the chance of a single keyword being used for | ||
multiple purposes is slightly increased. With nested hierarchies, the parent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, this reads as having a lot of what look like opinions that bias the reader to the opinion of the author.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what you mean here. It is difficult to be truly neutral in the analysis (especially since the RFC is in favour of a particular solution) but I do think that if we can change the wording to be more neutral, that's a good idea.
anymore. Eventually the crates.io server will also start to ignore the `categories` | ||
field for new uploads and remove the existing categories data from the database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring categories is a silent breaking behavior change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly why I originally suggested treating the categories as keywords in the original RFC, but after @Turbo87 did some research, this would most likely just lead to a lot of duplicates since users who use categories often use keywords, and they don't necessarily use the same ones for each.
That is a strict change in behavior though, although I'm not sure if it's considered a breaking change since we haven't really defined what a breaking change means in these cases. Do you simply classify it that way, or are there specific cases where this can cause crate builds to break?
- Should the `categories` field be deprecated in cargo too? | ||
- Are third-party registries using categories? | ||
- How much usage of cargo is with crates.io vs. third party registries? | ||
- Would users be confused if it was only deprecated on crates.io but not in cargo? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo if we move forward with this RFC, we should deprecate it and should be in an adjoining RFC that get approved together. We can transition out categories
on an Edition boundary, having cargo fix --edition
do the merge in Cargo.toml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that moving the true "removal" of categories (in the sense that Cargo no longer sends them) to an edition boundary would be okay, although I'm not quite sure how editions would work here since registries aren't really controlled by editions, and crates.io also isn't controlled by editions. Since crates.io would be effectively ignoring categories down the line, this would mostly just require users to use an older cargo edition to use categories on their registries, which feels like a weird thing to support.
In terms of splitting the RFC: which parts exactly do you think would be better suited to an adjacent RFC? The main reason for combining the two is that, if the plan is to coordinate merging both at the same time, there doesn't seem to be much benefit to splitting them, but I'm also not 100% sure what split you're thinking of.
Regardless, I think that the deprecation of categories should at least be followed with an increase in the number of allowed keywords at the same time, to ensure that users have the ability to adequately tag their crates. We could potentially discuss some of the other features separately, though.
Personally, I think this is still premature and we should be trying out some of the ideas mentioned here to vet them out before we commit to relying on them to make up for the loss of a closed vocabulary. |
I definitely think that there's room for doing a "soft" rollout to these features after the RFC is merged, since they by their nature depend on the community using them to work. I don't think that even if we accepted this RFC as-is that we'd be locked into the approach, although I'm also not sure what the general policies are for the cargo and crates.io teams to experiment before an RFC is accepted. I think that experimentation is especially important here since we don't know 100% how the community will respond to the changes, but the primary idea (replacing a closed vocabulary with a curated open one) is still there. What kind of vetting do you think should be required to make a change like this? |
Categories for crates are now deprecated and implicitly added as keywords instead. A new set of policies is added to allow the crates.io team to curate the way keywords are presented, replacing features such as the "Popular Categories" list on crates.io with a "Popular Keywords" instead.
Rendered
Special thanks to @Turbo87 from the crates.io team for cleaning up this RFC and adding extra context.