Respond to incoming HTTPS requests when all work is done #258

phillipj · 2020-04-10T11:51:13Z

TLDR; upgrading the package we use to talk to github.com causes integrations tests to break, and those tests are currently unreliable.

The package we're using to talk to github.com uses a deprecated authentication mechanism that I regularly get emails about: https://developer.github.com/changes/2020-02-10-deprecating-auth-through-query-param/.

While naively trying to upgrade that package, I stumbled opened up a can of worms that I want to fix properly first...

What has changed?

Make integration tests execute all external API request expectations

Previously only a few of the external API request expectations setup with nock.js has been run (details explained in the long story below).

How we run those expectations was introduced in the beginning of this project, a blooper I'm most likely to blame for, and copy-pasted later on.

Make server respond to incoming HTTPS requests after all event listeners has finished executing

Primarily to make our tests more reliable. For that to happen we need to be confident the code we're running tests on, has finished executing.

Replace internal pub/sub mechanism from raw EventEmitter to something where we know when all listeners has finished executing

Lots of the internals of the github-bot is setup as event listeners subscribing for events like pull_request.opened. Those events are in practise GitHub webhook events.

For us to be able to know when all event listeners are done with their work, using pure EventEmitter does not cut it because our listeners perform asynchronous work. To mitigate that, events-async has been installed. It allows us to know when all (webhook) event listeners has executed as long as listeners returns a Promise.

Refactoring our listeners from a callback/fire-n-forget type of code, to return a Promise, obviously involves quite a lot of changes. Those changes has been written to not change any behaviour, but rather only promisify them.

The long story

Up until now, a handful of integration tests has been flawed because we didn't actually verify all nock.js expectations (read: nock.js scope) had been fulfilled. That was due to a desire to run several methods as a one-liner instead of one per line;

filesScope.done() && existingRepoLabelsScope.done()

That could make sense at first glance, but there's a very important catch we didn't think about when writing it; it requires .done() to return a truthy value or else what comes after .done() will not be executed.

The .done() method on nock.js' scope either throws an exception if the expectation it represents has not been met, or returns undefined when all is good.

That undefined value stops the subsequent methods on the same line to be executed. In other words, we have always just checked the first expectation, and none of the subsequent ones on the same line.

The changes introduced in this commit executes these .done()s on their own line which sadly causes all of the related tests to explode at the moment.

Why most of these expectations haven't been met is probably due to timing issues, since we don't wait for all the related code to have finished executing before the tests are run. As of now, the code has been written in a fashion that allows incoming HTTPS requests to be get their response ASAP, whilst the code pushing PR statuses to github.com or trigger Jenkins builds are still running. That raises some challenges for our integration tests since they don't really know when the code is finished, meaning tests can run.

Upcoming changes will fix that by ensuring incoming requests will get their response after all relevant code has succeeded or failed. That will introduce quite a big refactor job, but the end result will be a lot more robust tests that can be trusted.

Sooo with that braindump out of the way, any immediate thoughts?

I assume there's very few comfortable reviewing and approving these changes. Still would appreciate any kinds of feedback, good and bad, high level or detailed, I'm all ears!

Most importantly is getting ack from someone active near the core collaborator community that can point my way if any awkwardness happens that I don't catch myself. I'm ready to jump and investigate when needed.

Up until now, a handful of integration tests has been flawed because we didn't actually verify all nock.js expectations (read: nock.js scope) had been fulfilled. That was due to a desire to run several methods as a one-liner instead of one per line; ``` filesScope.done() && existingRepoLabelsScope.done() ``` That could make sense at first glance, but there's a very important catch we didn't think about when writing it; it requires `.done()` to return a truthy value or else what comes after `.done()` will not be executed. The `.done()` method on nock.js' scope either throws an exception if the expectation it represents has not been met, or returns `undefined` when all is good. That `undefined` value stops the subsequent methods on the same line to be executed. In other words, we have always just checked the first expectation, and none of the subsequent ones on the same line. The changes introduced in this commit executes these `.done()`s on their own line which sadly causes all of the related tests to explode at the moment. Why most of these expectations haven't been met is probably due to timing issues, since we don't wait for all the related code to have finished executing before the tests are run. As of now, the code has been written in a fashion that allows incoming HTTPS requests to be get their response ASAP, whilst the code pushing PR statuses to github.com or trigger Jenins builds are still running. That raises some challenges for our integration tests since they don't really know when the code is finished, meaning tests can run. Upcoming changes will fix that by ensuring incoming requests will get their response *after* all relevant code has succeeded or failed. That will introduce quite a big refactor job, but the end result will be a lot more robust tests that can be trusted.

As a precursor to running test assertions after we *know* all work has been done, most our use of fake timers gets removed here as it will not be needed anymore and even causes some challenges not easily solved when requests starts hanging due to work not having finished yet and the culprit is timers not running as they normally do.

Several existing integration tests were failing because there was nock.js expectations created that wanted to see existing repository labels being fetched from github.com. Since we cache repository labels for an hour, the said retrieval of labels only happened in the first of label integration tests, whilst not in the subsequent ones since the labels had been put in a cache by then. We don't want to depend on the order of running tests, but rather start from a clean slate everytime, these changes ensure we reinitialise the `app` in every related test, resulting in clean labels cache everytime.

With the goal of making our integration tests more robust and actually reliable, these changes makes sure incoming requests related to GitHub events, gets responded to after all work is done. This makes a big difference for our test suite as we now know when it is safe to run assertions. Up until now we've responded to said requests immediately and kicked off work like setting PR labels and similar, asynchronously. Although that could make sense considering what's done in runtime, it makes life hard when wanting to verify the behaviour while testing since we don't know when all the works has completed. Conceptually these changes consists of: - Installing and using [events-async](https://www.npmjs.com/package/events-async) to know when all event listeners has completed - Promisify all the event listeners since the above npm package uses promises instead of callbacks to handle async tasks

These changes comes as a result of making tests more resilient. When running focused integration tests, we have some expectations of what kinds of requests are sent to github.com and what the end result should be after the bot has run all its magic. Those expectations are written to only have one specific script in mind, e.g. what should happen when pushing Jenkins updates or what should happen when labelling PRs. Up until now, all of our scripts has been setup by `require()`ing them inside `./app.js`. That means all has been setup and thereby reacted to incoming github.com webhook requests we've sent to the github-bot while testing. That's far from ideal because we cannot have "simple" and focused expectations about which github.com requests are expected and what they should get in response. Hence these changes; whilst running tests, we only `require()` the respective script the tests relates to. Although to keep the automagic of setting up all script when the github-bot is started in production, we move the lines of code that iterates over the available scripts into `./server.js` instead which is *only* executed in production, not by when testing.

Since several existing integration tests expected these labels to exist in the first page of labels results from github.com, but wasn't raised as an error until now that the tests has been refactored.

phillipj · 2020-05-24T20:17:54Z

Anyone in the @nodejs/github-bot team or @Trott still interested to share thoughts about the above?

phillipj · 2020-06-23T09:08:25Z

@Trott @richardlau as active core collaborators, you got any thoughts on me merging this and being on the alert of issues going forward?

I'm still getting emails from GitHub about API deprecation. Upgrading related packages would be very error prone with what's currently in master due to unreliable tests.

Refs https://developer.github.com/changes/2020-02-10-deprecating-auth-through-query-param/

Trott

Rubber-stamp LGTM

Trott · 2020-06-23T22:43:47Z

@Trott @richardlau as active core collaborators, you got any thoughts on me merging this and being on the alert of issues going forward?

I didn't review the changes very closely, but they look good to me and this sounds like the way to go.

phillipj · 2020-06-26T20:29:39Z

Appreciate the feedback @Trott!

I'll go ahead and merge this when I know I've got some time to jump if anything unexpected happens.

phillipj · 2020-07-04T19:55:47Z

Now that this is live in production, I'll be checking in on the errors logs once in a while to see if something obvious appears.

Don't hesitate to open issues and ping me directly if you spot weird behaviour from the bot going forward!

phillipj added 6 commits April 10, 2020 13:35

Add missing timers and v6.x label in integration labels fixture

0652131

Since several existing integration tests expected these labels to exist in the first page of labels results from github.com, but wasn't raised as an error until now that the tests has been refactored.

phillipj changed the title ~~WIP: Respond to incoming HTTPS requests when all work is done~~ Respond to incoming HTTPS requests when all work is done May 24, 2020

phillipj marked this pull request as ready for review May 24, 2020 20:15

Trott approved these changes Jun 23, 2020

View reviewed changes

This was referenced Jun 28, 2020

feat: update bot comment when Jenkins finishes #259

Closed

Modernizing the project #264

Open

phillipj merged commit 570b796 into nodejs:master Jul 4, 2020

phillipj deleted the resilient-tests-refactor branch July 4, 2020 19:50

phillipj mentioned this pull request Aug 5, 2020

chore(jenkins): refactor jenkins-status using events #271

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Respond to incoming HTTPS requests when all work is done #258

Respond to incoming HTTPS requests when all work is done #258

phillipj commented Apr 10, 2020 •

edited

Loading

phillipj commented May 24, 2020

phillipj commented Jun 23, 2020

Trott left a comment

Trott commented Jun 23, 2020

phillipj commented Jun 26, 2020

phillipj commented Jul 4, 2020

Respond to incoming HTTPS requests when all work is done #258

Respond to incoming HTTPS requests when all work is done #258

Conversation

phillipj commented Apr 10, 2020 • edited Loading

What has changed?

The long story

phillipj commented May 24, 2020

phillipj commented Jun 23, 2020

Trott left a comment

Choose a reason for hiding this comment

Trott commented Jun 23, 2020

phillipj commented Jun 26, 2020

phillipj commented Jul 4, 2020

phillipj commented Apr 10, 2020 •

edited

Loading