Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize Tests #31

Merged
merged 14 commits into from
Jul 19, 2020
Merged

Parallelize Tests #31

merged 14 commits into from
Jul 19, 2020

Conversation

aminya
Copy link
Member

@aminya aminya commented Jul 9, 2020

Description of the Change

This:

  • allows requesting any tests on any operating system
  • parallelizes core renderer tests by using separate child processes for each file.
  • runs core renderer tests and packages tests in parallel using async.parallel. So if one of the tests goes in timeout, the others will continue to run. This speeds up running the tests by a lot! 🚀
  • allows rerunning only the failed tests rather than restarting the whole CI.
  • On Windows, runs core renderer tests using two Azure agents. This in addition to double the overall test speed, will allow rerunning the failed ones in case some fail due to timeouts.

Verification

The CI passes

Release Notes

N/A

@aminya aminya added the CI label Jul 9, 2020
@aminya aminya mentioned this pull request Jul 9, 2020
@aminya
Copy link
Member Author

aminya commented Jul 9, 2020

The tests now run half the time (2X faster) and there are fewer timeout issues! It is surprising that no one has noticed this for such a long time!

It is interesting to know that calling commands through child-process is way slower than calling them directly!

@aminya aminya changed the title Run Windows tests directly Run Windows CI directly on x64 Jul 9, 2020
@aminya aminya force-pushed the windows_tests branch 5 times, most recently from a7b6256 to 77d38f3 Compare July 9, 2020 11:36
@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 9, 2020

Much faster! I suppose if we posted this to upstream they'd let us know if they still needed it the old way...

Seems good to me. Windows 64-bit runner runs 64-bit Node, Windows 32-bit runner runs 32-bit Node... What's not to like?

@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 9, 2020

By the way Windows 64-bit tests have been failing (timing out) everywhere. Upstream, this fork, my personal fork.

This commit had CI re-run 11 times at upstream before the tests passed on Windows 64-bit.

I think this PR is fine, and not causing the CI failures.

@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 9, 2020

We should manually run the "release branch build" pipeline on this and see if it produces working Atom installers on Windows. That would be a good way to validate this, IMO.

Edit: trying here: https://dev.azure.com/DeeDeeG/b/_build/results?buildId=83&view=logs&j=0d2f351d-5899-57e2-0cb5-b37eb91cc930&t=0d2f351d-5899-57e2-0cb5-b37eb91cc930

@aminya
Copy link
Member Author

aminya commented Jul 9, 2020

I saw that 11min is wasted in this for the timeouts caused by some of the tests. In my previous tests, it only took ~30 min to complete the tests.

Do you think we should try dividing the tests similar to MacOs?

@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 9, 2020

Do you think we should try deciding dividing the tests similar to MacOs?

Sure. if there is a neat way to do it, I don;t see why not. Hopefully much of it can be "copy-pasted" (or reverse-engineered) from the macOS config.

@aminya aminya force-pushed the windows_tests branch 3 times, most recently from 2b3de81 to 88aa803 Compare July 9, 2020 22:37
@aminya
Copy link
Member Author

aminya commented Jul 9, 2020

@DeeDeeG Do the cache steps also cache the build atom step?

@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 9, 2020

@aminya They cache only the Bootstrap.

Caching the build would be trickier, but upstream was interested in it way back in the day, so I shouldn't say it's impossible: atom#19437

In the future, we may want to investigate a couple of different options:
[...]

@aminya
Copy link
Member Author

aminya commented Jul 9, 2020

@aminya They cache only the Bootstrap.

Caching the build would be trickier, but upstream was interested in it way back in the day, so I shouldn't say it's impossible: atom#19437

Currently, the zip files are downloaded and reused. That is similar to caching.

@aminya aminya force-pushed the windows_tests branch 4 times, most recently from d5ca8ae to cddb2e0 Compare July 10, 2020 00:36
@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 10, 2020

A lot of the build happens in [repo_root]/out/, so if we can cache some stuff there, maybe that would speed things up even more. Probably have to rewrite some of the script/build code to not run a second time. (like the script/build --no-bootstrap flag, but for some things even later in the process... like the electron snapshot???)

Edit to add:

Currently, the zip files are downloaded and reused. That is similar to caching.

Only for mac though, right? And they wouldn't need to be uploaded/downloaded if the steps weren't split up for parallelization, so the uploading/downloading in and of itself is something slowing things down, and a bit of a compromise vs running straight on in a single container. (Outweighed by the benefit of parallel jobs though.)

In my mind the goal of "more caching" is to do with saving and restoring something that persists from an entirely separate run of CI. From a different commit, or a previous run of the same commit, either way. Not as much splitting up a run for parallelization, though that is another good strategy. And yeah okay that is caching. But it's same-run caching, as opposed to caching from one run to the following fresh one.

@aminya
Copy link
Member Author

aminya commented Jul 10, 2020

A lot of the build happens in [repo_root]/out/, so if we can cache some stuff there, maybe that would speed things up even more. Probably have to rewrite some of the script/build code to not run a second time. (like the script/build --no-bootstrap flag, but for some things even later in the process... like the electron snapshot???)

Speeding up the build is not our priority. There are many things to improve there:
https://github.com/atom/atom/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-desc+build

That needs serious changes to the build script. I don't think caching helps here much. If we can find anything cacheable I would vote for it, but I didn't see anything in the script.

I created a project to track build related improvements: https://github.com/atom-ide-community/atom/projects

@aminya
Copy link
Member Author

aminya commented Jul 17, 2020

@DeeDeeG any idea about these downloading errors in MacOS renderer tests? I am not sure if I have seen them before:

Error Downloading Update: Could not get code signature for running application
Error Downloading Update: Could not get code signature for running application
2020-07-17 02:11:16.279 Atom Dev[43327:67273] Persistent UI failed to open file file:///Users/runner/Library/Saved%20Application%20State/com.github.atom.savedState/window_1.data: No such file or directory (2)
nvm is not compatible with the npm config "prefix" option: currently set to "/usr/local"
Run `npm config delete prefix` or `nvm use --delete-prefix v6.17.1 --silent` to unset it.

They do not affect the passing of the tests, but they appear as warnings.

@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 17, 2020

Those messages are three separate things.

Best explanation I can give at the moment, since I honestly don't know why we are seeing this just now, but I'll keep it in mind. Maybe it's because we changed the shell we run in, (Powershell), and perhaps that's why different warnings/errors are making their way in front of our eyeballs.

@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 17, 2020

On another note: If this or any PR is "ready for review" please let me know. Apologies, I was asleep when the big templates PR was "ready to go," so I didn't see the final state of it before it got merged.

@aminya
Copy link
Member Author

aminya commented Jul 17, 2020

This is ready to go. I may change minor things if I see anything.

@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 17, 2020

Thanks for letting me know!

It happens to be late where I am, so I'm going to have to go to sleep pretty soon...

I remember this being a promising concept, and hopefully not very hard after (and compared to) #46. I would need to look at it tomorrow to really comment on things thoroughly, but I can take a quick look now.

@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 17, 2020

Suprised that this changes script/test, which is also useful to run outside of CI.

And src/main-process/start.js is also unexpected. Can't look closely right now at the changes, but if it's possible to do this limited to script/vsts/, then that's desirable IMO.

Edit: I see this changes script/test quite a lot. I can only say I hope upstream are interested. We test in a meaningfully different way than upstream by now, so the guarantees that our code works upstream are becoming fewer and fewer as we diverge. Keep that in mind I guess.

I'm not super great at JavaScript. If this is the direction you want to go, it'll take me a while to review it. I can do that tomorrow, and if this isn't merged by then (it would be nice to get a chance to really look this over, IMO) then I'll do my best.

@aminya aminya added the Tests label Jul 17, 2020
@aminya
Copy link
Member Author

aminya commented Jul 17, 2020

And src/main-process/start.js is also unexpected.

That is because temp package cannot remove the created temp file when we use async.parallel. That is not anything important. It just means some log files will remain when Atom crashes. With that being said I have reported that already in this issue: bruce/node-temp#86

Suprised that this changes script/test, which is also useful to run outside of CI.
Can't look closely right now at the changes, but if it's possible to do this limited to script/vsts/, then that's desirable IMO.

The main test script is script/test. Things in CI are just a simple call to that script. To parallelize the tests I had to change script/test itself. Instead of looking at the Github diff, check the source code itself.

@DeeDeeG
Copy link
Member

DeeDeeG commented Jul 17, 2020

I was expecting more along the lines of how macOS is split into three test runners. Minimally updated to work for Windows.

I will review the code if you don't merge before I can take a look.

Honestly it is not difficult to run things one-off in a Ci environment closer to upstream's, so if I ever have doubts that tests results are the same with these changes vs upstream's setup I'll do a run and see for myself, just to address my own concerns.

Sadly I cannot review the JS "today" in my time zone, it would have to be in about 8 hours-ish at the earliest. (More likely 10 or 12 hours from now, or a bit later, depending on how complicated the code is.)

@aminya
Copy link
Member Author

aminya commented Jul 17, 2020

I was expecting more along the lines of how macOS is split into three test runners. Minimally updated to work for Windows.

No, they run different tests. MacOS runs packages' test, Windows runs Core Renderer tests.

Sadly I cannot review the JS "today" in my time zone, it would have to be in about 8 hours-ish at the earliest. (More likely 10 or 12 hours from now, or a bit later, depending on how complicated the code is.)

I will keep this until you review it. There is no rush 😄.

Copy link
Member

@DeeDeeG DeeDeeG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really qualified to do a deep-dive into the JavaScript. I'm still trying to read it, but what I've seen all looks reasonable. If I understand correctly, script/test wasn't prepared to split up the renderer tests before? So that was needed in script/test in order have the CI run it in a split up way.

I fully endorse that, assuming it was needed (appears to be the case).

If CI passes, I approve.

Comment below is something I think we could suggest at upstream.

script/test Outdated Show resolved Hide resolved
script/test Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants