Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding spell checking to CI #4473

Closed
jsoref opened this issue Feb 5, 2020 · 6 comments · Fixed by #4799
Closed

Consider adding spell checking to CI #4473

jsoref opened this issue Feb 5, 2020 · 6 comments · Fixed by #4799
Labels
Area-Build Issues pertaining to the build system, CI, infrastructure, meta Area-CodeHealth Issues related to code cleanliness, linting, rules, warnings, errors, static analysis, etc. Issue-Task It's a feature request, but it doesn't really need a major design. Needs-Tag-Fix Doesn't match tag requirements Product-Meta The product is the management of the products. Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release.

Comments

@jsoref
Copy link
Contributor

jsoref commented Feb 5, 2020

Description of the new feature/enhancement

#4295 (comment) asked if it was possible to have automated spell checking in the same way that clang-tidy rules are run by CI

Proposed technical implementation details (optional)

I have a live opensource integration running w/ Travis in the checkstyle project:
https://github.com/checkstyle/checkstyle/blob/master/.ci/test-spelling-unknown-words.sh

And yesterday I finally wrote a version that we're going to use internally w/ Jenkins:
https://github.com/jsoref/spelling-ci/tree/jenkins-ci-gsutil

That branch includes the dependencies. I'm not sure whether Windows Server CI is Perl friendly. If it is, then migrating the Bash bits to Perl is manageable. If it isn't, then it probably means porting the entire thing.

The code isn't particularly complicated, so I may end up just hosting it in a bunch of languages (Perl/Python/JavaScript). (This general task has been on my todo list for a number of years.)

@jsoref jsoref added the Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. label Feb 5, 2020
@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Feb 5, 2020
@jsoref
Copy link
Contributor Author

jsoref commented Feb 5, 2020

As noted in #4295 (comment), one has to select a Dictionary. It looks like Visual Studio is using Hunspell. I've worked w/ it in the past. http://wordlist.aspell.net/dicts/ -- the format for its wordlist isn't quite flat, so it would take a bit of code to properly handle, but not a whole lot (basically each line has an optional suffix that is essentially a way to compress acceptable variations of a word).

@zadjii-msft zadjii-msft added Area-Build Issues pertaining to the build system, CI, infrastructure, meta Area-CodeHealth Issues related to code cleanliness, linting, rules, warnings, errors, static analysis, etc. Issue-Task It's a feature request, but it doesn't really need a major design. Product-Meta The product is the management of the products. and removed Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. labels Feb 5, 2020
@ghost ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Feb 5, 2020
@zadjii-msft zadjii-msft added this to the Terminal Backlog milestone Feb 5, 2020
@zadjii-msft zadjii-msft added the Help Wanted We encourage anyone to jump in on these. label Feb 5, 2020
@DHowett-MSFT
Copy link
Contributor

This would definitely be good. Thanks for doing a first pass on our code, as well!
Taking Triage off, leaving on backlog.

@DHowett-MSFT DHowett-MSFT removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Feb 6, 2020
@jsoref
Copy link
Contributor Author

jsoref commented Feb 24, 2020

So, I've finally packaged things up nicely enough that they might be usable for others.

Overview

It's available as an action.

Configuration is basically ~4 things:

  1. the workflow that uses the action
  2. references for how to get the whitelist+exclusions
  3. excluded paths
  4. whitelisted tokens

Output

Output comes in a couple of flavors:

Configuration

The whitelist can live in the same repo & branch as the code, I'm not a huge fan of this as it means that other random spelling tools have to learn to ignore it as well. If you want to do this, it's pretty easy, you'd just add the two files to the excludes.txt file and configure the bucket to be ./path/to/parent/of/project (you can use project=. to flatten things as well).

The version I've prototyped here checks out a distinct branch from the same repository and then provides that directory to the engine. -- I plan to add better support for this format (probably by supporting bucket='git:' to mean "use this repository, just get the files from branch=$project". The whitelist/excludes can also live in other repositories or be retrieved from https:/gs: urls -- in these cases, the tool will give correct instructions for updating.

Warnings/Caveats

Note: while the tool suggests how to whitelist, you're really supposed to check to see if the word is misspelled first (someone sheepishly came to me and said something like "oh ... your tool caught an actual error, but instead of paying attention, I just whitelisted it").

Caveats: forks of repositories don't run Actions out of the box, and when they come back as pull requests, their token is basically a read-only token, which is not terribly helpful for reviewing a PR. To address that, there's a third mode (beyond pull_request and push) for schedule which will try to iterate over open PRs, identify foreign ones and review them.

Disclaimers

This code is in its early stages, and I haven't yet deployed it into a large repository. "You could be the early adopter."


@miniksa, @DHowett-MSFT: If you're interested, I'm happy to make a PR for this -- at this point, I'd need input about how you'd want things structured (mostly: where would you want things to live).

I'd expect you'd want to pin the action by a hash as opposed to using a tagged release.

Since GitHub runs the code in Docker, I'd hope you'd be ok w/ that, as opposed to waiting for me to port the code.

@miniksa
Copy link
Member

miniksa commented Feb 27, 2020

I'm alright with a PR for this.

I'm OK if it runs in Docker with GitHub Actions as long as that doesn't mind that most of our CI will run in Azure DevOps while this will run in parallel in a GitHub Action. It just shows up as another check on PR, right?

I'd rather the configuration be included in this repository along with everything else. I think it could go in a subdirectory of ./build. Maybe ./build/config? The YML file could go into ./build/pipelines like the rest of 'em.

I don't mind if it is completely unrunnable on developer desktops and only shows up during PRs. By your caveat, I would want to use whatever scheduling thing was required to make sure that it hit PRs coming back in.

I also don't mind if we're an early adopter of something. You're very on top of this and very engaged. I trust that you'll be able to work out whatever issues might arise and evolve it as GitHub Actions improves. And if it causes some catastrophic problem... it's Git. We'll just revert it/pull it out until we can turn it back on again. Can't be any worse than our flaky tests that have been decimating PRs for weeks.

@miniksa
Copy link
Member

miniksa commented Feb 27, 2020

Oh, if this has to go into the .github folder, that's fine too.

@jsoref
Copy link
Contributor Author

jsoref commented Feb 27, 2020

Yes, it shows up as just another check on the PR.

To have GitHub do the running, workflow file itself needs to live in the .github/workflows folder.

The whitelist+exclusions can live anywhere. I think that having them in the repository (as opposed to on a distinct branch) is the right approach (I'm slowly becoming convinced of this) -- although I really don't like the idea of merge conflicts for the whitelist -- thankfully, in theory the script should be able to talk people through resolving it in the worst case, but...).

And thanks for being open to being an early adopter. I'll probably update this on the weekend.

jsoref added a commit to jsoref/terminal that referenced this issue Mar 2, 2020
jsoref added a commit to jsoref/terminal that referenced this issue Mar 2, 2020
jsoref added a commit to jsoref/terminal that referenced this issue Mar 4, 2020
jsoref added a commit to jsoref/terminal that referenced this issue Mar 4, 2020
@ghost ghost added the In-PR This issue has a related PR label Mar 4, 2020
jsoref added a commit to jsoref/terminal that referenced this issue Mar 17, 2020
@ghost ghost added Needs-Tag-Fix Doesn't match tag requirements Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release. and removed Help Wanted We encourage anyone to jump in on these. In-PR This issue has a related PR labels Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Build Issues pertaining to the build system, CI, infrastructure, meta Area-CodeHealth Issues related to code cleanliness, linting, rules, warnings, errors, static analysis, etc. Issue-Task It's a feature request, but it doesn't really need a major design. Needs-Tag-Fix Doesn't match tag requirements Product-Meta The product is the management of the products. Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants