Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: skip build based on commands' exit codes #9649

Merged
merged 14 commits into from
Nov 14, 2022

Conversation

humitos
Copy link
Member

@humitos humitos commented Oct 10, 2022

Define a particular exit code (439) to skip a build.

If any of the commands returns this exit code, the build will be cancelled automatically and won't run any of the following commands. When this happens, the build will be marked as cancelled and no email/webhook notifications will be sent.

Why 439 was chosen for the exit code?

It's the word "skip" encoded in ASCII. Then it's taken the 256 modulo of it because the Unix implementation does this automatically for exit codes greater than 255

>>> sum(list('skip'.encode('ascii')))
439
>>> 439 % 256
183

I opened this PR in draft because there are some things to discuss and decide yet. It's just a proof of concept for now.

  • Do we need a new skipped status?
  • Should we send "build status" notification as success?
  • Is it possible to write the message in gray instead of red and avoid the word "Error"?

The following is an example of how it looks locally:

Screenshot_2022-10-10_17-00-05

I created a branch on test-builds that can be used for this: https://github.com/readthedocs/test-builds/compare/skip-build-command?expand=1

Closes #871
Closes #8545

Define a particular exit code (439) to skip a build.

If any of the commands returns this exit code, the build will be cancelled
automatically and won't run any of the following commands. When this happens,
the build will be marked as `cancelled` and no email/webhook notifications will
be sent.

Why 439 was chosen for the exit code?

It's the word "skip" encoded in ASCII. Then it's taken the 256 modulo of it because
the Unix implementation does this automatically for exit codes greater than 255

```
>>> sum(list('skip'.encode('ascii')))
439
>>> 439 % 256
183
```
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love how simple this implementation is. I agree there is some UX tweaking, but overall this seems like a really neat step to work on in the near future.

Looks like GH supports skipped and cancelled as a status. So maybe it does make sense to mirror them as different?

Screen Shot 2022-10-12 at 1 44 29 PM

readthedocs/doc_builder/constants.py Show resolved Hide resolved
@humitos
Copy link
Member Author

humitos commented Oct 13, 2022

@ericholscher

Looks like GH supports skipped and cancelled as a status.
So maybe it does make sense to mirror them as different?

I think we are not using this endpoint. I understand this are "GitHub Check" (the tab that appear on each PR) and we are using "Commit Statuses", which are these ones: https://docs.github.com/en/rest/commits/statuses#create-a-commit-status

(I found this by checking the source code at

statuses_url = f'https://api.github.com/repos/{owner}/{repo}/statuses/{commit}'
)

The state field has 4 option, but it called my attention the error and failure. I'm not sure what is the difference. We are using failure currently to communicate the build has failed:

GITHUB_BUILD_STATUS_FAILURE = 'failure'

Screenshot_2022-10-13_12-26-52

@ericholscher
Copy link
Member

I think we are not using this endpoint. I understand this are "GitHub Check" (the tab that appear on each PR) and we are using "Commit Statuses", which are these ones: https://docs.github.com/en/rest/commits/statuses#create-a-commit-status

Annoying 👎

@kambyzkargrshhamt

This comment was marked as spam.

@kambyzkargrshhamt

This comment was marked as spam.

@kambyzkargrshhamt

This comment was marked as spam.

@humitos humitos self-assigned this Oct 17, 2022
@humitos
Copy link
Member Author

humitos commented Oct 18, 2022

I think it would be good to move forward with this implementation but behind a feature flag and enable it in our own project. That way, we will be able to try the !git command and start experimenting with the UX while we iterate over it until we are happy.

Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a great place to add some docs too!

# 439
# >>> 439 % 256
# 183
RTD_SKIP_BUILD_EXIT_CODE = 183
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I brought up the point that perhaps we want support for more explicit codes here. If so, perhaps we should choose a less arbitrary value here and allocate a block of exit codes.

183 is in a safe range for posix standards at least though:
https://tldp.org/LDP/abs/html/exitcodes.html

But, do we want something like:

  • Exit code 180: skip build immediately
  • Exit code 181: fail build immediately
  • Exit code 182: retry build later?
  • etc

Not to say we're implementing everything now, but if we will have more than one exit code, I'd suggest starting from a value that doesn't seem so arbitrary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

183 is in a safe range for posix standards at least though

Yeah, I checked for this as well and I was happy that "skip" was in that range 😄

I brought up the point that perhaps we want support for more explicit codes here

This looks interesting, but right now I'm not seeing a clear usage for other useful exit codes. We can keep thinking about this and see if we find some other useful cases. In any case, we can reserve the range 180-185 and start with 183 for this one; there is no need to make them consecutive.

readthedocs/doc_builder/exceptions.py Outdated Show resolved Hide resolved
@agjohnson
Copy link
Contributor

Regarding a few questions, above, I don't have much feedback on the GitHub internals, but:

Do we need a new skipped status?

I think "cancelled" both fits perfectly well, and doesn't require anything new from us.

But also, the points I've made around modelling build state is that either way this build would be considered "cancelled" state. That is, "cancelled" describes the build ended before success/failure, and we don't need duplicate states to describe the different reasons for being in this state.

A skipped state would be a sub-state of Build(state="cancelled") if we ever wanted to surface this specific state in the UI in some way (though not sure we need to really).

Should we send "build status" notification as success?

This might depend on how users consider this feature. I consider it only notifies at the end of builds, and cancelling a build doesn't allow for a build to get this far.

I think it would be fine to skip the notification either way, I don't think users would care to be notified when a build is cancelled.

Is it possible to write the message in gray instead of red and avoid the word "Error"?

This is the error message refactor I've been wanting for a long time. We can always do this piecemeal and add another one-off warning message type to the build logic. But I'm also fine showing the error message for now and coming back to this in a more thoughtful way later, when we have time to rework our error message UX.

@humitos
Copy link
Member Author

humitos commented Oct 20, 2022

@agjohnson

I think it would be fine to skip the notification either way, I don't think users would care to be notified when a build is cancelled.

I used the word "notification" which introduced some confusion here. To clarify, when the build gets cancelled we are not sending any notification to users (e.g. email).

I was referring to the "commit status" notification on PR from GitHub. That is, GH will trigger a PR build in our platform and put a "yellow" circle next to the commit. What we should tell GH about this "commit status" once the build got cancelled because of this exit code?

The only way to make the PR "mergeable" would be to tell GH the build succeeded. Otherwise, GH will put a "red cross" or keep the "yellow circle" and won't let people merge the PR (if these checks are enforced).

@agjohnson
Copy link
Contributor

Ah, I see. It seems like a success status is exactly what we want to give GitHub, no? Unless I'm missing something, it seems like the user would always want us to report a success on the status check if they are instructing us to also skip the build. Is there any case where we would want to fail the status check?

We can also give this control to the user with multiple exit codes:

  • Exit code 180: Cancel build, notify success to GH
  • Exit code 181: Cancel build, notify failure to GH

But I do think success is the primary status users would want here.

@agjohnson
Copy link
Contributor

agjohnson commented Oct 22, 2022

I just realized this PR will resolve our second most longest standing open issue: #871 🎉

@humitos
Copy link
Member Author

humitos commented Oct 24, 2022

@agjohnson

Ah, I see. It seems like a success status is exactly what we want to give GitHub, no? Unless I'm missing something, it seems like the user would always want us to report a success on the status check if they are instructing us to also skip the build. Is there any case where we would want to fail the status check?

I think sending "success" makes sense. However, I the workflow I'm a little concern is the following:

  1. user opens a PR that does not touches docs/
  2. the build is skipped in Read the Docs
  3. the is marked as success in the GH commit status
  4. the PR is merged
  5. the merged code makes Read the Docs' build to fail

In that scenario, if the build from the PR would have been not skipped, the error would have detected before merging the PR. I think this is fine, tho, since the users are who controls whether or not skip the build --so, they can tune it as much as they want to avoid this kind of scenarios.

@agjohnson
Copy link
Contributor

Yeah I would agree, I share that reservation about this feature in general. This is actually the reason why I've always been reluctant to prioritize either feature -- doc build success can depend on code/commits outside /docs/.

It's definitely something to note, but I think we're in the clear here because the user will be enabling this feature for their projects.

@stsewd
Copy link
Member

stsewd commented Oct 26, 2022

Instead of having some special exit codes, I'd vote for having this as explicit options in the config file, similar to github actions https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idcontinue-on-error, and maybe even let users define what exit code to use (like fail cancel build on 0 or 123)

@humitos
Copy link
Member Author

humitos commented Oct 26, 2022

@stsewd I'm not sure to understand what config you would like to have in the config file. Can you expand on that?

Also, what would be the extra benefits for users to be able to decide what exit code will make the build to be skipped via the config file? I'm asking to be able to compare the benefits and the complexity of its implementation + documentation + UX. We can always start with a fixed/default exit code and grow from there if we think it's going to open good flexibility for our users.

@agjohnson
Copy link
Contributor

User configurable exit codes would be a great approach overall, but I do agree with @humitos that it's probably too much initial work to be worth the effort. This might be a great upgrade if we get users using the feature though.

I would be +1 on user defined codes if we had several well defined exit codes we wanted to add in this first pass, or if we had other parts of the build process we wanted to override in a similar way. But, if no one has strong feelings here, it sounds like we only have one or two exit codes to implement

@stsewd
Copy link
Member

stsewd commented Oct 26, 2022

I'm not sure to understand what config you would like to have in the config file. Can you expand on that?

jobs:
  post_checkout:
    - run: ! git diff --quiet origin/main -- docs
      on-error: "cancel"  # "fail", "continue"

This way is more explicit about what will happen if the command fails, and it also avoids any collision caused by a random command.

Here, an error is any exit code that is different from 0, but we could introduce another option to define what code is considered an error, or something like on-exit-code.code and on-exit-code.action.

Our code will handle two types of commands, a list of strings or a list of mappings, this will also allow us to add more options as needed in the future.

@agjohnson
Copy link
Contributor

I do like the suggestion of giving users control over these values. This seems better long term. I'm sure we're bound to encounter some edge case with static exit codes at some point, and giving users a way to opt into this behavior avoids all of that.

But this is a more complex addition and we're not sure if users will use this feature yet. I'd vote to keep this feature minimal for now and just use a static value. If we feel a little uncomfortable about the static, magic exit code, we should give users a way to opt in to this feature in addition.

I think this is worth keeping on our radar, but we benefit from keeping things simple until we know if users will use this feature.

@humitos
Copy link
Member Author

humitos commented Oct 26, 2022

I like the suggestion from @stsewd but I do agree with @agjohnson that it's not a good fit for the v1 of this. I'd say let's move forward with what we currently have that's 80-20 and re-consider the extra flexibility once we know more about how our users are using it 👍🏼

@agjohnson
Copy link
Contributor

agjohnson commented Oct 26, 2022

I'll bring it up separately, but this just leaves me wanting feature flags defined in our configuration file again. Specifically, a non-strict dictionary in our config spec would allow us to add features with configuration from the start, as we can skip strict config specifications.

I do think what @stsewd is suggesting is better overall, and a lower barrier to managing our configuration file would allow that. Alas, we're not there yet.

Dropped notes at: #9698

When the command exists with the magic exit code, we send SUCCESS to
GitHub/GitLab as status so the PR check passes and the PR can be merged.
@humitos
Copy link
Member Author

humitos commented Nov 8, 2022

I'm marking this PR ready for review.

  • send SUCCESS build status notification when skipping a build so the PR is in a mergeable state
  • add small documentation example to "Build customization" page explaining how to use this

I'd like other people QAing this locally if possible and move forward with the current implementation. We have bigger good ideas around this, but I think we won't be able to prioritize them right now. IMO, this feature is good enough as it is currently and doesn't break any pattern. I suppose we can come back to this feature in the future and improve it in different directions.

Let me know what you think or if you want to request any changes before moving on.

docs/user/build-customization.rst Outdated Show resolved Hide resolved
docs/user/build-customization.rst Outdated Show resolved Hide resolved
docs/user/build-customization.rst Outdated Show resolved Hide resolved
docs/user/build-customization.rst Outdated Show resolved Hide resolved
docs/user/build-customization.rst Show resolved Hide resolved
docs/user/build-customization.rst Outdated Show resolved Hide resolved
humitos and others added 3 commits November 8, 2022 16:23
Thanks to Benjamin for pointing it to me ;)
Co-authored-by: Benjamin Balder Bach <[email protected]>
Use Bash's `if` to only run this code on pull requests.
@humitos
Copy link
Member Author

humitos commented Nov 9, 2022

Note that we are rephrasing this from "Skip" to "Cancel". The build is not really skipped. It is triggered, it's executed and then, based on a condition, it may be cancelled before start building the documentation. So, I think it's fine to call communicate this as "Cancelled builds".

* Docs: refactor "skipping a build" section

- Move the explanation about the "Cancelling builds" feature to the "Builds"
page
- Keep the examples (How-To) for "Cancel a build" into the "Build customization"
page

* Docs: use 183 instead of 439 exit code

> This error code isn't a valid exit
> code (tldp.org/LDP/abs/html/exitcodes.html), we shouldn't document a code above
> 255 otherwise users will get confused.

From Eric's review.

* Apply suggestions from code review

Co-authored-by: Eric Holscher <[email protected]>

* Minor fix underline

* Use multi-line bash examples

They are easier to read

* Update docs/user/builds.rst

Co-authored-by: Eric Holscher <[email protected]>

Co-authored-by: Eric Holscher <[email protected]>
@benjaoming
Copy link
Contributor

benjaoming commented Nov 10, 2022

Note that we are rephrasing this from "Skip" to "Cancel". The build is not really skipped. It is triggered, it's executed and then, based on a condition, it may be cancelled before start building the documentation. So, I think it's fine to call communicate this as "Cancelled builds".

Ooooh this is a bit difficult. I get confused by the terminology when we have "cancelled builds" as in builds that got auto-cancelled by RTD and then "cancelled" builds that the developer was deliberately skipping.

When we encounter that skipped/cancelled builds auto-cancel any current builds, it does seem hard to grasp, but it will become even harder if we have to explain that cancelled build #100 auto-cancelled build #99.

I know what you mean that "skipped" build sounds like the entire build didn't execute, but so can "canceled". I don't know if there are more alternatives to explore... like "halted", "stopped", "suspended", "killed", "aborted"?

I like "skip" because it's what I would look for in the documentation. I wouldn't look for "cancel" because that - to me - implies an error status.


Edit: I didn't have a look at #9717 until now. The above comment would have been better placed there.

@humitos
Copy link
Member Author

humitos commented Nov 10, 2022

@benjaoming

I know what you mean that "skipped" build sounds like the entire build didn't execute, but so can "canceled". I don't know if there are more alternatives to explore... like "halted", "stopped", "suspended", "killed", "aborted"?

I don't have a strong preference here. I think both approaches have their pros/cons. The backend will use cancelled status and the UI will show "Cancelled" in red, so I'd prefer to stick with this terminology for now. At least, until we are able to change the UI in the new templates to add another status or similar. Related to #9110

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels pretty much done to me, and would be good to ship next week. I do think there's still a couple UX rough edges, but overall this is a useful feature, and anyone who's using it is advanced enough to deal with rough edges :)

docs/user/build-customization.rst Show resolved Hide resolved
docs/user/build-customization.rst Outdated Show resolved Hide resolved
@humitos
Copy link
Member Author

humitos commented Nov 14, 2022

I'm merging this PR so it gets deployed tomorrow 👍🏼

@humitos humitos enabled auto-merge (squash) November 14, 2022 09:12
@humitos humitos merged commit e5c1d5b into main Nov 14, 2022
@humitos humitos deleted the humitos/skip-build-based-on-exit-code branch November 14, 2022 09:25
.. code-block:: python

>>> sum(list('skip'.encode('ascii')))
439
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Belated comment: Do you still need this explanation now that you no longer use 439 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It explains where the number 183 comes from. Isn't it clear with this note? I'm happy to update it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants