Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respond to incoming HTTPS requests when all work is done #258

Merged
merged 6 commits into from
Jul 4, 2020

Conversation

phillipj
Copy link
Member

@phillipj phillipj commented Apr 10, 2020

TLDR; upgrading the package we use to talk to github.com causes integrations tests to break, and those tests are currently unreliable.

The package we're using to talk to github.com uses a deprecated authentication mechanism that I regularly get emails about: https://developer.github.com/changes/2020-02-10-deprecating-auth-through-query-param/.

While naively trying to upgrade that package, I stumbled opened up a can of worms that I want to fix properly first...

What has changed?

Make integration tests execute all external API request expectations

Previously only a few of the external API request expectations setup with nock.js has been run (details explained in the long story below).

How we run those expectations was introduced in the beginning of this project, a blooper I'm most likely to blame for, and copy-pasted later on.

Make server respond to incoming HTTPS requests after all event listeners has finished executing

Primarily to make our tests more reliable. For that to happen we need to be confident the code we're running tests on, has finished executing.

Replace internal pub/sub mechanism from raw EventEmitter to something where we know when all listeners has finished executing

Lots of the internals of the github-bot is setup as event listeners subscribing for events like pull_request.opened. Those events are in practise GitHub webhook events.

For us to be able to know when all event listeners are done with their work, using pure EventEmitter does not cut it because our listeners perform asynchronous work. To mitigate that, events-async has been installed. It allows us to know when all (webhook) event listeners has executed as long as listeners returns a Promise.

Refactoring our listeners from a callback/fire-n-forget type of code, to return a Promise, obviously involves quite a lot of changes. Those changes has been written to not change any behaviour, but rather only promisify them.

The long story

Up until now, a handful of integration tests has been flawed because we didn't actually verify all nock.js expectations (read: nock.js scope) had been fulfilled. That was due to a desire to run several methods as a one-liner instead of one per line;

filesScope.done() && existingRepoLabelsScope.done()

That could make sense at first glance, but there's a very important catch we didn't think about when writing it; it requires .done() to return a truthy value or else what comes after .done() will not be executed.

The .done() method on nock.js' scope either throws an exception if the expectation it represents has not been met, or returns undefined when all is good.

That undefined value stops the subsequent methods on the same line to be executed. In other words, we have always just checked the first expectation, and none of the subsequent ones on the same line.

The changes introduced in this commit executes these .done()s on their own line which sadly causes all of the related tests to explode at the moment.

Why most of these expectations haven't been met is probably due to timing issues, since we don't wait for all the related code to have finished executing before the tests are run. As of now, the code has been written in a fashion that allows incoming HTTPS requests to be get their response ASAP, whilst the code pushing PR statuses to github.com or trigger Jenkins builds are still running. That raises some challenges for our integration tests since they don't really know when the code is finished, meaning tests can run.

Upcoming changes will fix that by ensuring incoming requests will get their response after all relevant code has succeeded or failed. That will introduce quite a big refactor job, but the end result will be a lot more robust tests that can be trusted.


Sooo with that braindump out of the way, any immediate thoughts?

I assume there's very few comfortable reviewing and approving these changes. Still would appreciate any kinds of feedback, good and bad, high level or detailed, I'm all ears!

Most importantly is getting ack from someone active near the core collaborator community that can point my way if any awkwardness happens that I don't catch myself. I'm ready to jump and investigate when needed.

Up until now, a handful of integration tests has been flawed because
we didn't actually verify all nock.js expectations (read: nock.js scope)
had been fulfilled. That was due to a desire to run several methods as
a one-liner instead of one per line;

```
filesScope.done() && existingRepoLabelsScope.done()
```

That could make sense at first glance, but there's a very important
catch we didn't think about when writing it; it requires `.done()`
to return a truthy value or else what comes after `.done()` will not
be executed.

The `.done()` method on nock.js' scope either throws an exception if
the expectation it represents has not been met, or returns
`undefined` when all is good.

That `undefined` value stops the subsequent methods on the same line to
be executed. In other words, we have always just checked the first
expectation, and none of the subsequent ones on the same line.

The changes introduced in this commit executes these `.done()`s on their
own line which sadly causes all of the related tests to explode at
the moment.

Why most of these expectations haven't been met is probably due to
timing issues, since we don't wait for all the related code to have
finished executing before the tests are run. As of now, the code has
been written in a fashion that allows incoming HTTPS requests to be get
their response ASAP, whilst the code pushing PR statuses to github.com
or trigger Jenins builds are still running. That raises some challenges
for our integration tests since they don't really know when the code is
finished, meaning tests can run.

Upcoming changes will fix that by ensuring incoming requests will get
their response *after* all relevant code has succeeded or failed.
That will introduce quite a big refactor job, but the end result will
be a lot more robust tests that can be trusted.
As a precursor to running test assertions after we *know* all work has
been done, most our use of fake timers gets removed here as it will
not be needed anymore and even causes some challenges not easily solved
when requests starts hanging due to work not having finished yet and
the culprit is timers not running as they normally do.
Several existing integration tests were failing because there was
nock.js expectations created that wanted to see existing repository
labels being fetched from github.com.

Since we cache repository labels for an hour, the said retrieval of
labels only happened in the first of label integration tests, whilst
not in the subsequent ones since the labels had been put in a cache
by then.

We don't want to depend on the order of running tests, but rather start
from a clean slate everytime, these changes ensure we reinitialise the
`app` in every related test, resulting in clean labels cache everytime.
With the goal of making our integration tests more robust and actually
reliable, these changes makes sure incoming requests related to GitHub
events, gets responded to after all work is done.

This makes a big difference for our test suite as we now know when it
is safe to run assertions.

Up until now we've responded to said requests immediately and kicked
off work like setting PR labels and similar, asynchronously. Although
that could make sense considering what's done in runtime, it makes life
hard when wanting to verify the behaviour while testing since we don't
know when all the works has completed.

Conceptually these changes consists of:

- Installing and using [events-async](https://www.npmjs.com/package/events-async)
  to know when all event listeners has completed
- Promisify all the event listeners since the above npm package uses
  promises instead of callbacks to handle async tasks
These changes comes as a result of making tests more resilient.

When running focused integration tests, we have some expectations of
what kinds of requests are sent to github.com and what the end result
should be after the bot has run all its magic.

Those expectations are written to only have one specific script in mind,
e.g. what should happen when pushing Jenkins updates or what should
happen when labelling PRs.

Up until now, all of our scripts has been setup by `require()`ing them
inside `./app.js`. That means all has been setup and thereby reacted
to incoming github.com webhook requests we've sent to the github-bot
while testing. That's far from ideal because we cannot have "simple"
and focused expectations about which github.com requests are expected
and what they should get in response.

Hence these changes; whilst running tests, we only `require()` the
respective script the tests relates to.

Although to keep the automagic of setting up all script when the
github-bot is started in production, we move the lines of code that
iterates over the available scripts into `./server.js` instead which
is *only* executed in production, not by when testing.
Since several existing integration tests expected these labels
to exist in the first page of labels results from github.com, but wasn't
raised as an error until now that the tests has been refactored.
@phillipj phillipj changed the title WIP: Respond to incoming HTTPS requests when all work is done Respond to incoming HTTPS requests when all work is done May 24, 2020
@phillipj phillipj marked this pull request as ready for review May 24, 2020 20:15
@phillipj
Copy link
Member Author

Anyone in the @nodejs/github-bot team or @Trott still interested to share thoughts about the above?

@phillipj
Copy link
Member Author

@Trott @richardlau as active core collaborators, you got any thoughts on me merging this and being on the alert of issues going forward?

I'm still getting emails from GitHub about API deprecation. Upgrading related packages would be very error prone with what's currently in master due to unreliable tests.

Refs https://developer.github.com/changes/2020-02-10-deprecating-auth-through-query-param/

Copy link
Member

@Trott Trott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rubber-stamp LGTM

@Trott
Copy link
Member

Trott commented Jun 23, 2020

@Trott @richardlau as active core collaborators, you got any thoughts on me merging this and being on the alert of issues going forward?

I didn't review the changes very closely, but they look good to me and this sounds like the way to go.

@phillipj
Copy link
Member Author

Appreciate the feedback @Trott!

I'll go ahead and merge this when I know I've got some time to jump if anything unexpected happens.

@phillipj phillipj merged commit 570b796 into nodejs:master Jul 4, 2020
@phillipj phillipj deleted the resilient-tests-refactor branch July 4, 2020 19:50
@phillipj
Copy link
Member Author

phillipj commented Jul 4, 2020

Now that this is live in production, I'll be checking in on the errors logs once in a while to see if something obvious appears.

Don't hesitate to open issues and ping me directly if you spot weird behaviour from the bot going forward!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants