-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve docs on some char boolean methods #61794
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @dtolnay (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I agree that documenting char::is_lowercase as "Returns true if this char is lowercase" is silly.
I think this change is mostly fine but I would like to try to improve clarity of the all-important first sentence on each method. First sentences are supposed to stand alone to the extent that if you only read the first sentence of each method you still get a reasonable understanding of the point of the method. Documentation writers commonly get this wrong by having the first sentence summarize the rest of the documentation of the method; if the summary can only make sense after reading and understanding the rest of the documentation then the first sentence fails to stand alone.
My feeling is that many of these new first sentences only make sense if the reader already understands the point of the method.
For example you have:
Returns
true
if thischar
has theWhite_Space
property.
This sentence means nothing until after the reader learns what "White_Space property" refers to by reading the rest of the documentation.
As I see it there are 2 key things that a first sentence would need to communicate to characterize this method:
- You (as the reader) know roughly what is meant by the word "whitespace". This method involves that type of whitespace as you would expect; it isn't some jargon.
- But the details are complicated and have something to do with Unicode.
I think the sentence as written isn't effective at communicating either of these things. We can try to brainstorm some alternatives; I feel that something more in this direction would work better:
Returns true if this is a whitespace character as classified by Unicode's
White_Space
property.
Another dimension to keep in mind about first sentences is that they ideally communicate the point of the method to the intended target audience for the method 1 in terms that makes sense for that audience. The set of people who will want to call is_whitespace() is different and has different background knowledge compared to the set of people who will need to call something like is_grapheme_extended(), so there will tend to be differences in the character of the documentation as a result.
1 Occasionally we extend this to include also the set of people who would be most likely mislead into thinking that they want to call a method they shouldn't, i.e. "this is probably not what you want"-style documentation, but this is uncommon.
@dtolnay Thank you for the enthusiastic response. I'm more than happy to brainstorm alternative approaches. I generally agree with your summary of the goals of the first sentence. I concede that I may have erred more on the side of brevity than necessary. I left out “Unicode” since I think it's understood that Considering your specific example:
I like the usage of “classified” better than the existing “satisfies” or my “has.” I would amend it slightly to:
I think it's better to avoid the possessive form when it's not necessary, and “the Unicode ... property” helps to imply that there's only one and that it's standardized. Another option would be to re-emphasize the code point aspect:
I feel that Also, you left out “this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, “the Unicode ... property” works for me. "Code point" instead of "character" is fine too, will leave that up to you. I would lean toward "character" because the type's documentation introduces it as "The char
type represents a single character" and talks about "code point" only to clarify what character means on a technical level. I don't think it would be a benefit in clarity to make that distinction again in all of these methods.
I think Unicode is important to mention by name in some form. I would not expect readers to deduce that "the White_Space
property" by itself refers to some Unicode thing. Searching for White_Space property brings up information about CSS white-space
before anything about Unicode.
“this char
” is intentionally omitted. In documentation I typically recommend sticking to English words instead of code blocks where English is sufficiently concise and unambiguous, as in "returns true if ..." rather than "returns true
if ...". Code blocks are formatted deliberately to disrupt the reader's flow with a context switch and are only worth using if that is called for.
@JohanLorenzo Yes. Apologies for the delay. I got caught up with something else, and now I'm away from the computer for a week. I'll get to it next week. |
Hey there 🙂 Thanks for the ping. Although, you probably meant to at-mention @JohnCSimon 😃 |
@JohanLorenzo Oops! Sorry! I didn't pay close enough attention to GitHub's completion. It usually works so well. |
☔ The latest upstream changes (presumably #62902) made this pull request unmergeable. Please resolve the merge conflicts. |
@spl Can you please address the merge issues? This is so close! Thanks. |
@spl Ping again from triage ... |
hello from triage
|
Improve docs on some char boolean methods simple revival of rust-lang#61794 (also rustfmt on rest of file :) Documentation for `is_xid_start()` and `is_xid_continue()` couldn't be improved since both methods got remove from this repository r? @dtolnay cc @JohnCSimon
This is an attempt to improve some of the documentation on
char
methods that reference Unicode aspects. My goals are:latest
https://www.unicode.org links to always point to the latest standard and documentation rather than a specific versionI believe the uniformity and wording is improved, but perhaps not everyone agrees. You will find, for example, that I replace “is an alphabetic code point” with “has the
Alphabetic
property” and give a reference for theAlphabetic
property. Since the method is namedis_alphabetic()
, it is slightly redundant to say that the meaning is “is alphabetic,” and it also doesn't provide more information, whereas, the Unicode documentation does. However, you may prefer, in addition to the reference, something like a copy of the Unicode documentation, e.g. “TheAlphabetic
property is a derived informative property of the primary units of alphabets and/or syllabaries, whether combining or noncombining.” I'm not yet convinced that adding this is useful because it tends to involve a lot of unfamiliar vocabulary for the uninitiated. Perhaps some middle ground could be found here, but I'm not sure what that looks like at the moment.I link to the latest Unicode documentation since that seems easiest to maintain. However, it is not necessarily the standard implemented. Right now,
unicode::tables::UNICODE_VERSION
is 11.0.0, but the latest standard is 12.1.0. But if you want to link to the current version implemented, then those documented links should be updated every time the version is updated. Unless there is some automated mechanism for handling the version update, I think it's an acceptable minor error to link to the latest version.