-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPEC 12: Formatting mathematical expressions #326
base: main
Are you sure you want to change the base?
Conversation
cc @mdhaber @stefanv @jarrodmillman @j-bowhay and add yourself as co-authors of course 😉 |
spec-0012/index.md
Outdated
-a | ||
``` | ||
|
||
- Within a group, if operators with different priorities are used, add whitespace around the operators with the lowest priority(ies). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the definition of "group" above, it is not possible for operators within a group to have different priority.
I believe the definition of "group" was supposed to be something more like a sequence of operations that relies on implicit order of operations rules. Examples include:
- a logical line
- operations within parentheses
- the expression and for/if clauses of a list comprehension
spec-0012/index.md
Outdated
We define a _group_ as a collection of operators having the same priority. | ||
e.g. `a + b + c` is a single group, `a + b * c` is composed of two groups `a` | ||
and `b * c`. A group is also a collection delimited with parenthesis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spec-0012/index.md
Outdated
( | ||
a/b | ||
+ c*d | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should show an example where splitting the line makes sense. We would not want to suggest that a / b + c*d
should be split across four lines. Consider referring to PEP8 "Should a Line Break Before or After a Binary Operator".
I'm not sure if the use of the term "logical block" is correct. This is a single logical line split across multiple physical lines.
spec-0012/index.md
Outdated
|
||
``` | ||
a/d*b**c | ||
a*(b**c)/d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't appear to follow the rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How so?
spec-0012/index.md
Outdated
If this is technically an issue (e.g. restriction on the AST), add | ||
parenthesis or spaces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rule appears to conflict with the previous rule about space around operators *
and /
.
I think the reason for an exception - if there needs to be one - should be more explicit. I remember discussing in person that linting tools might check the AST to ensure that it is not modified by an auto-correction, but this is not something that the user will necessarily be thinking abou. A user might be ordering sequences of operations in a particular way to get floating point arithmetic to do what they want. If they are tempted to break the rules to do so:
- the linter does not have to be able to make the correction automatically
- the user is welcome to use parentheses
- the user is welcome to declare an exception to the rule with
noqa
I think the rules need to be more complete before we can assess whether there a need for an exception, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general thought, rules could conflict if we ask them to be applied one after the order in a strict order.
The principle of these rules is that users should not care at all and even learn them. The linters are here for that. If a user does something in a certain order for arithmetic reasons and there is a reordering happening, then we can ask tools to either not reorder or provide a skip for a given rule. See the last point in the notes.
spec-0012/index.md
Outdated
(a*b) * (c*d) | ||
``` | ||
|
||
- Operators within a group are ordered from the lowest to the highest priority. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure "group" here is being used in the same sense as elsewhere. I remember discussing in person the idea that a/d*b**c
is preferable to a*b**c/d
unless there are explicit parentheses, like (a*b**c)/d
, but it would not be wrong to do a/d*b**c + e/f*g**h
, yet the plus comes after higher priority operators.
spec-0012/index.md
Outdated
``` | ||
|
||
- There is no space before and after operators `*` and `/`. Only exception is if the expression consist of a single operator linking two groups with more than one | ||
element. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be another exception below.
https://github.com/scientific-python/specs/pull/326/files#r1631763770
According to "Only exception is if the expression consist of a single operator linking two groups", there is no exception for:
(a*b)*(c*d)*(e*f)
because there are three groups? Or do you mean that you need a space when any binary operator is linking two explicit "groups" enclosed by parentheses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your example it means no spaces because we have 3 groups.
And if we have 2 groups then each group must have at least one operator in it (ie not just a variable or single number).
spec-0012/index.md
Outdated
i = i + 1 | ||
submitted += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these bad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are good, for now o just copy pasted the whole block in Black. But yes we should remove the correct ones from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Matt. If you are going to be an author, feel free to just go ahead and commit changes 😉
Co-authored-by: Matt Haberland <[email protected]>
Hi @charliermarsh @ambv 👋 I had pinged you both on Twitter some years back about a standard for mathematical equations. At the time you both said this could interest you for both Ruff and Black if we, the Scientific Python community, could come to an agreement. To be clear, I am of course in no position to ask for any commitment from you and I just hope that you find the topic interesting enough. This is getting into shape and we have a preliminary draft we would like to present you 😃 Any comments would greatly help and in the end this can also only work if both Ruff and Black would be able and willing to implement this specification 🙏 Thanks again both! |
Awesome, I look forward to reading + engaging here. |
Thanks @charliermarsh. I think the main question right now is whether this is close to being precise enough to be implementable. For example, I don't have much background with ASTs, so perhaps you can suggest ideas that would replace the notions like "implicit subexpressions" I attempted to define. One we have a better understanding of how to write this standard so that it's implementable, we will want to get feedback from a wider audience about adjustments to the particular rules. |
spec-0012/index.md
Outdated
surrounding whitespace. For example, prefer `-x**4` over `- (x ** 4)`. | ||
2. Always surround non-PEMDAS operators with whitespace, and always make the priority of | ||
non-PEMDAS operators explicit. For example, prefer `(x == y) or (w == t)` over | ||
`x==y or w==t`.[^1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this imply that we should add parentheses here (as in: is this an exception to Rule 0)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this is intended to be an example of "otherwise specified".
levels increases and when multiple non-PEMDAS operators are involved. Portions of this | ||
acronym, namely MD and AS, will be used below to refer to the corresponding operators. | ||
|
||
## Implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should any of these rules differ based on the expression type? E.g., all of the examples below use Name
nodes (like x
, y
, etc.). What if the expression uses calls or subscripts or similar? Like f() ** 2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to add examples, but no - to keep things simple, I didn't consider changing the rules based on that.
spec-0012/index.md
Outdated
A "subexpression" is subset of an expression that is either explicit or could | ||
be made explicit (i.e. with parentheses) without affecting the order of | ||
operations. In the example above, `j` and `range(1, i + 1)` can also be | ||
referred to as explicit subexpressions of the whole expression, and `1` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"...of the whole expression"
Assuming I'm reading this correctly, I would be careful here as for j in range(1, i + 1)
is not an "expression" in the sense of the Python AST. for
is a kind of "statement".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we correct this as simply as:
referred to as explicit subexpressions of the whole expression, and `1` and | |
referred to as explicit subexpressions of the whole statement, and `1` and |
?
I think the point I was trying to make would still be made.
spec-0012/index.md
Outdated
explicit as in `x + (y*z)` without changing the order of operations. However, `x + y` | ||
would not be a subexpression because `(x + y)*z` would change the order of operations. | ||
Note that `x + y*z` as a whole may also be referred to as a "subexpression" rather than | ||
an "expression" even though `(x + y*z)` is not a proper subset of the whole. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "proper subset of the whole" mean? Sorry!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A proper subset of a set
This is saying that even though an expression like (x + y*z)
might be the entire expression in question, we might still refer to it as a "subexpression" when the distinction is not important.
Does this need a clarification, or do you think it would be clear to those familiar with the term "proper subset"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this makes sense! I think it will be clear to those that are familiar with the term.
I think it would depend solely on the use of the operators. In that example of file paths, the formatting would be the same with these rules as what you've shown - spaces around the division operator since it is a "simple" (as opposed to compound) expression. There may be cases in which these rules would suggest funny things for overloaded operators, but my guess is that in most cases, it would just mean a
Sure. There are costs to the simplicity, though. PEP8 explicitly recommends "If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies)". For the sake of simplicity (presumably), Black/E225 just ignore this recommendation. The downside is that some find this makes the resulting expression more difficult to read. The goal of this SPEC is to capture the spirit of the PEP8 recommendation in rules. They will not be as simple as "add whitespace around all operators", but some may find that the benefits outweigh this cost.
So, for the whitespace question, they all have essentially the same recommendation as Black and E225: always surround operators with whitespace. Others may have different goals, but one of my motivations here was to avoid that blanket rule. I think one of the reasons
Because of 8) "If line breaks must occur within a compound subexpression, the break should be placed before the operator with lowest priority." and "If there are multiple candidates, include the break at the first opportunity.", these (and some other variations) would satisfy the rules prescribed here:
|
Is this ready to go in as draft @stefanv ? |
@lucascolley Thanks for the ping. Yes, it's definitely in good enough shape to shop around for discussion. Note that there is one unclosed LaTeX expression, which would be good to fix first. Some personal remarks:
Anyone on here should feel free to move the PR out of draft status and merge, when they're ready. Or give me the thumbs-up and I'll do so. |
I shared this concern, but after speaking to Matt I think it's fine considering the specification in (5) that there must be just a single And I think the rules don't forbid developers to include extra parentheses in cases like this if they would like to be extra clear. |
Thanks again @MichaReiser for the points you raised above. I've had a little more time to think about them and I think you did a great job of explaining the challenges of adopting this in Ruff.
One idea that I'm not sure has been considered yet would be a completely separate mode of the formatter + linter. The start point would be implementing this specification as lint rules and being able to format code to comply with it, but in isolation (just this spec). If that could be achieved, then any rules and formatting that do not conflict with this spec could then be added into this separate mode. This separate mode would never be able to offer a superset of ruff's regular functionality. But it could still be enough for us.
Of course, the main problem is always time. At least with this suggestion, the separate mode wouldn't block new formatting changes from integrating with "normal" ruff. That work would have to happen at some point if we wanted those changes in the separate mode, but in theory it could be done at a way later point.
Makes sense, although the context for this discussion is that Black’s accomplishments have been largely unable to reach the Scientific Python community due to concerns like this and the formatting of arrays. So we would at least be extending past Black’s accomplishments in some ways :) Now I can see a lot of arguments for why a completely separate mode like this wouldn’t belong in Ruff’s API but be a separate package which uses Ruff’s machinery under the hood. Maybe that is what we’re looking for. Still, I think there would be value in translating the non-conflicting parts of this spec to additional lint rules. I’m just trying to be ambitious :) |
Done.
Can you give an example expression?
Perhaps you're happy with Black's rules for mathematical expressions, and that's fine. I got the sense that others weren't, though, so the rules here attempt to balance the conflicting desires:
without requiring the addition of lots of parentheses in expressions involving the common PEMDAS operators. When I type That's the rationale, but other viewpoints (e.g. "whitespace is only omitted around the highest-priority operator of a compound expression") are perfectly valid, too, so I don't think we'll be able to reason out which is best. I think it may be a matter of either adopting a mostly-baked proposal like this one or putting forth equally complete alternatives and voting. Or maybe we accept something like this as a draft and vote on specific changes like that.
Ah, right, Pamphile mentioned that before, and we discussed moving it to a reference section at the end. I'll do that.
Yeah, I'll take another look at that.
The first rule is "do not add extraneous parentheses", but the last is "Any of the preceding rules may be broken if there is a clear reason to do so." So if the rules lead you to an expression that is particularly difficult to parse, break them. But (Another set of rules I considered would be based on "Whitespace is only omitted around the highest-priority operator of a compound expression" and something like "Expressions with more than two operator priority levels must make subexpressions with two operator priority levels explicit.* This would mean you would have to write |
I guess I was alluding to the possibility that a formatter wouldn't add parentheses by default, but also wouldn't take them away if they are already there. |
pre-commit.ci autofix |
Sorry, that should have been (0) and (5). It's not super intuitive to me when you can / should / must use brackets; but it may well be that the rules are comprehensive. Wouldn't it be OK to always allow stylistic brackets, for aesthetic grouping, or grouping that is logical to the author? [This discussion is not a blocker to merging the PR, btw.] |
Yes, see rule 9. |
"Any of the preceding rules may be broken if there is a clear reason to do so." |
Yes? Because you asked:
It even shows examples of adding parentheses that aren't necessary. |
I just posted the quote so others don't have to look it up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefanv Math still didn't render. Any thoughts?
Requires a theme update. Fix has already been made. /cc @jarrodmillman |
This follows the proposal on the forum https://discuss.scientific-python.org/t/spec-12-formatting-mathematical-expressions
See other linked discussions as well.