Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: manually check pending requests #654

Merged
merged 4 commits into from
Oct 19, 2022
Merged

Conversation

sbellone
Copy link
Collaborator

With noticed that some pages are not fully rendered.
After some investigation, it appears that it's because we render the page too soon, while all requests have not returned yet.

This appear to be caused by some flakyness in the waitForLoadState('networkidle') feature, confirmed by some Playwright contributors:

To bypass this problem, this PR does something similar than in the suggested gist: counting pending requests, and waiting for the number to get back to 0 before rendering the page.

Changes

  • Added pending requests to the metrics, increased on 'request' and decreased on requestfailed or requestfinished
  • Wait for pending requests to reach 0 in the rendering process
  • Log the metrics in the response for easier debugging

@sbellone sbellone self-assigned this Oct 18, 2022
@sbellone sbellone requested a review from millotp October 18, 2022 16:01
Comment on lines +107 to +120
while (
this.page.pendingRequests > 0 &&
Date.now() - startWaitTime < timeBudget
) {
log.debug(
{ pageUrl: this.page.ref?.url() },
`Waiting for ${
this.page.pendingRequests
} requests to complete... WaitTime:${
Date.now() - startWaitTime
}, timeBudget: ${timeBudget}`
);
await setTimeout(1000);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a timeout for this ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe a for loop instead of while with limited iteration

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the timeBudget is always set with a max: https://github.com/algolia/renderscript/blob/master/src/lib/tasks/Task.ts#L68, and the max by default is 20s: https://github.com/algolia/renderscript/blob/master/src/lib/constants.ts#L27
On the Crawler side, we also don't allow to configure it with more than 20s

@sbellone sbellone merged commit 21b6a80 into master Oct 19, 2022
@sbellone sbellone deleted the fix/networkidle-flaky branch October 19, 2022 15:50
algolia-bot added a commit that referenced this pull request Oct 19, 2022
## [2.1.22](v2.1.21...v2.1.22) (2022-10-19)

### Bug Fixes

* manually check pending requests ([#654](#654)) ([21b6a80](21b6a80))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants