-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build: skip build based on commands' exit codes #9649
Conversation
Define a particular exit code (439) to skip a build. If any of the commands returns this exit code, the build will be cancelled automatically and won't run any of the following commands. When this happens, the build will be marked as `cancelled` and no email/webhook notifications will be sent. Why 439 was chosen for the exit code? It's the word "skip" encoded in ASCII. Then it's taken the 256 modulo of it because the Unix implementation does this automatically for exit codes greater than 255 ``` >>> sum(list('skip'.encode('ascii'))) 439 >>> 439 % 256 183 ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love how simple this implementation is. I agree there is some UX tweaking, but overall this seems like a really neat step to work on in the near future.
Looks like GH supports skipped and cancelled as a status. So maybe it does make sense to mirror them as different?
I think we are not using this endpoint. I understand this are "GitHub Check" (the tab that appear on each PR) and we are using "Commit Statuses", which are these ones: https://docs.github.com/en/rest/commits/statuses#create-a-commit-status (I found this by checking the source code at
The
|
Annoying 👎 |
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
I think it would be good to move forward with this implementation but behind a feature flag and enable it in our own project. That way, we will be able to try the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a great place to add some docs too!
# 439 | ||
# >>> 439 % 256 | ||
# 183 | ||
RTD_SKIP_BUILD_EXIT_CODE = 183 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I brought up the point that perhaps we want support for more explicit codes here. If so, perhaps we should choose a less arbitrary value here and allocate a block of exit codes.
183 is in a safe range for posix standards at least though:
https://tldp.org/LDP/abs/html/exitcodes.html
But, do we want something like:
- Exit code 180: skip build immediately
- Exit code 181: fail build immediately
- Exit code 182: retry build later?
- etc
Not to say we're implementing everything now, but if we will have more than one exit code, I'd suggest starting from a value that doesn't seem so arbitrary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
183 is in a safe range for posix standards at least though
Yeah, I checked for this as well and I was happy that "skip" was in that range 😄
I brought up the point that perhaps we want support for more explicit codes here
This looks interesting, but right now I'm not seeing a clear usage for other useful exit codes. We can keep thinking about this and see if we find some other useful cases. In any case, we can reserve the range 180-185 and start with 183 for this one; there is no need to make them consecutive.
Co-authored-by: Anthony <[email protected]>
Regarding a few questions, above, I don't have much feedback on the GitHub internals, but:
I think "cancelled" both fits perfectly well, and doesn't require anything new from us. But also, the points I've made around modelling build state is that either way this build would be considered "cancelled" state. That is, "cancelled" describes the build ended before success/failure, and we don't need duplicate states to describe the different reasons for being in this state. A skipped state would be a sub-state of
This might depend on how users consider this feature. I consider it only notifies at the end of builds, and cancelling a build doesn't allow for a build to get this far. I think it would be fine to skip the notification either way, I don't think users would care to be notified when a build is cancelled.
This is the error message refactor I've been wanting for a long time. We can always do this piecemeal and add another one-off warning message type to the build logic. But I'm also fine showing the error message for now and coming back to this in a more thoughtful way later, when we have time to rework our error message UX. |
I used the word "notification" which introduced some confusion here. To clarify, when the build gets cancelled we are not sending any notification to users (e.g. email). I was referring to the "commit status" notification on PR from GitHub. That is, GH will trigger a PR build in our platform and put a "yellow" circle next to the commit. What we should tell GH about this "commit status" once the build got cancelled because of this exit code? The only way to make the PR "mergeable" would be to tell GH the build succeeded. Otherwise, GH will put a "red cross" or keep the "yellow circle" and won't let people merge the PR (if these checks are enforced). |
Ah, I see. It seems like a success status is exactly what we want to give GitHub, no? Unless I'm missing something, it seems like the user would always want us to report a success on the status check if they are instructing us to also skip the build. Is there any case where we would want to fail the status check? We can also give this control to the user with multiple exit codes:
But I do think success is the primary status users would want here. |
I just realized this PR will resolve our second most longest standing open issue: #871 🎉 |
I think sending "success" makes sense. However, I the workflow I'm a little concern is the following:
In that scenario, if the build from the PR would have been not skipped, the error would have detected before merging the PR. I think this is fine, tho, since the users are who controls whether or not skip the build --so, they can tune it as much as they want to avoid this kind of scenarios. |
Yeah I would agree, I share that reservation about this feature in general. This is actually the reason why I've always been reluctant to prioritize either feature -- doc build success can depend on code/commits outside It's definitely something to note, but I think we're in the clear here because the user will be enabling this feature for their projects. |
Instead of having some special exit codes, I'd vote for having this as explicit options in the config file, similar to github actions https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idcontinue-on-error, and maybe even let users define what exit code to use (like fail cancel build on 0 or 123) |
@stsewd I'm not sure to understand what config you would like to have in the config file. Can you expand on that? Also, what would be the extra benefits for users to be able to decide what exit code will make the build to be skipped via the config file? I'm asking to be able to compare the benefits and the complexity of its implementation + documentation + UX. We can always start with a fixed/default exit code and grow from there if we think it's going to open good flexibility for our users. |
User configurable exit codes would be a great approach overall, but I do agree with @humitos that it's probably too much initial work to be worth the effort. This might be a great upgrade if we get users using the feature though. I would be +1 on user defined codes if we had several well defined exit codes we wanted to add in this first pass, or if we had other parts of the build process we wanted to override in a similar way. But, if no one has strong feelings here, it sounds like we only have one or two exit codes to implement |
jobs:
post_checkout:
- run: ! git diff --quiet origin/main -- docs
on-error: "cancel" # "fail", "continue" This way is more explicit about what will happen if the command fails, and it also avoids any collision caused by a random command. Here, an error is any exit code that is different from 0, but we could introduce another option to define what code is considered an error, or something like Our code will handle two types of commands, a list of strings or a list of mappings, this will also allow us to add more options as needed in the future. |
I do like the suggestion of giving users control over these values. This seems better long term. I'm sure we're bound to encounter some edge case with static exit codes at some point, and giving users a way to opt into this behavior avoids all of that. But this is a more complex addition and we're not sure if users will use this feature yet. I'd vote to keep this feature minimal for now and just use a static value. If we feel a little uncomfortable about the static, magic exit code, we should give users a way to opt in to this feature in addition. I think this is worth keeping on our radar, but we benefit from keeping things simple until we know if users will use this feature. |
I like the suggestion from @stsewd but I do agree with @agjohnson that it's not a good fit for the v1 of this. I'd say let's move forward with what we currently have that's 80-20 and re-consider the extra flexibility once we know more about how our users are using it 👍🏼 |
I'll bring it up separately, but this just leaves me wanting feature flags defined in our configuration file again. Specifically, a non-strict dictionary in our config spec would allow us to add features with configuration from the start, as we can skip strict config specifications. I do think what @stsewd is suggesting is better overall, and a lower barrier to managing our configuration file would allow that. Alas, we're not there yet. Dropped notes at: #9698 |
When the command exists with the magic exit code, we send SUCCESS to GitHub/GitLab as status so the PR check passes and the PR can be merged.
I'm marking this PR ready for review.
I'd like other people QAing this locally if possible and move forward with the current implementation. We have bigger good ideas around this, but I think we won't be able to prioritize them right now. IMO, this feature is good enough as it is currently and doesn't break any pattern. I suppose we can come back to this feature in the future and improve it in different directions. Let me know what you think or if you want to request any changes before moving on. |
Thanks to Benjamin for pointing it to me ;)
Co-authored-by: Benjamin Balder Bach <[email protected]>
Use Bash's `if` to only run this code on pull requests.
Note that we are rephrasing this from "Skip" to "Cancel". The build is not really skipped. It is triggered, it's executed and then, based on a condition, it may be cancelled before start building the documentation. So, I think it's fine to call communicate this as "Cancelled builds". |
* Docs: refactor "skipping a build" section - Move the explanation about the "Cancelling builds" feature to the "Builds" page - Keep the examples (How-To) for "Cancel a build" into the "Build customization" page * Docs: use 183 instead of 439 exit code > This error code isn't a valid exit > code (tldp.org/LDP/abs/html/exitcodes.html), we shouldn't document a code above > 255 otherwise users will get confused. From Eric's review. * Apply suggestions from code review Co-authored-by: Eric Holscher <[email protected]> * Minor fix underline * Use multi-line bash examples They are easier to read * Update docs/user/builds.rst Co-authored-by: Eric Holscher <[email protected]> Co-authored-by: Eric Holscher <[email protected]>
Ooooh this is a bit difficult. I get confused by the terminology when we have "cancelled builds" as in builds that got auto-cancelled by RTD and then "cancelled" builds that the developer was deliberately skipping. When we encounter that skipped/cancelled builds auto-cancel any current builds, it does seem hard to grasp, but it will become even harder if we have to explain that I know what you mean that "skipped" build sounds like the entire build didn't execute, but so can "canceled". I don't know if there are more alternatives to explore... like "halted", "stopped", "suspended", "killed", "aborted"? I like "skip" because it's what I would look for in the documentation. I wouldn't look for "cancel" because that - to me - implies an error status. Edit: I didn't have a look at #9717 until now. The above comment would have been better placed there. |
I don't have a strong preference here. I think both approaches have their pros/cons. The backend will use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels pretty much done to me, and would be good to ship next week. I do think there's still a couple UX rough edges, but overall this is a useful feature, and anyone who's using it is advanced enough to deal with rough edges :)
I'm merging this PR so it gets deployed tomorrow 👍🏼 |
.. code-block:: python | ||
|
||
>>> sum(list('skip'.encode('ascii'))) | ||
439 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Belated comment: Do you still need this explanation now that you no longer use 439 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It explains where the number 183 comes from. Isn't it clear with this note? I'm happy to update it
Define a particular exit code (439) to skip a build.
If any of the commands returns this exit code, the build will be cancelled automatically and won't run any of the following commands. When this happens, the build will be marked as
cancelled
and no email/webhook notifications will be sent.Why 439 was chosen for the exit code?
It's the word "skip" encoded in ASCII. Then it's taken the 256 modulo of it because the Unix implementation does this automatically for exit codes greater than 255
I opened this PR in draft because there are some things to discuss and decide yet. It's just a proof of concept for now.
skipped
status?success
?The following is an example of how it looks locally:
I created a branch on test-builds that can be used for this: https://github.com/readthedocs/test-builds/compare/skip-build-command?expand=1
Closes #871
Closes #8545