[Bug] Scanning pages that use Server-Sent Events takes a minute longer than it should #2002

dbjorge · 2022-03-21T15:54:00Z

Describe the bug

The scan engine's page navigator includes functionality to attempt to wait for a page's outstanding network activity to settle before starting to scan it (in that it passes waitUntil: 'networkidle0' to Puppeteer's page.goto). However, when attempting to scan a page which uses Server-Sent Events (indefinitely-lived requests with content-type text/event-stream), the scan engine doesn't understand to special case this type of intentionally indefinite request. It assumes that it is a normal navigation-related request and blocks scanning until its (60s) navigation timeout elapses.

The motivating use case is an internal team attempting to run a scan via the Accessibility Insights for Azure DevOps task against a website in a developer environment based on an Express server with webpack-hot-middleware. This package is one of several HMR implementations which use Server-Sent Events to implement hot module reloading.

As a workaround, we've suggested that the impacted internal team disable HMR in their CI configuration running the Azure DevOps task, but we should still fix this on our end even if that customer is unblocked - running our scan CLI tool against a dev build of a website is a pretty reasonable use case.

Area

Install script
Update script
Scan CLI/Action
[?] Service (I expect the underlying issue would impact the service, too, but haven't verified this. I don't expect a user is very likely to end up running the service against an HMR-enabled page, but it's possible that a site being scanned could be using Server-Sent Events for non-HMR purposes)

To Reproduce

Steps to reproduce the behavior using webpack-hot-middleware's standard example project:

git clone https://github.com/webpack-contrib/webpack-hot-middleware.git -b v2.25.1
cd webpack-hot-middleware && npm clean-install
cd example && npm clean-install
npm start
(separate prompt) mkdir ai-scan-tester && cd ai-scan-tester
npm install [email protected]
npx ai-scan --crawl --url http://localhost:1616 --maxUrls 1

Expected behavior

Expected behavior: the single scan that this results in should only take as long as spinning up a puppeteer instance + minimal extra overhead (~5s)

Actual behavior: It instead takes "puppeeteer launch overhead + 60 seconds". Final request statistics: ... "requestAvgFinishedDurationMillis":62240 ...

Log files

Full console output from the repro steps above

❯ npx ai-scan --crawl --url http://localhost:1616 --maxUrls 1
INFO  PuppeteerCrawler:AutoscaledPool:Snapshotter: Setting max memory of this run to 3993 MB. Use the APIFY_MEMORY_MBYTES environment variable to override it.
INFO  PuppeteerCrawler:AutoscaledPool: state {"currentConcurrency":0,"desiredConcurrency":2,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":null},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":null}}}
INFO  Launching Puppeteer {"args":["--disable-dev-shm-usage","--no-sandbox","--disable-setuid-sandbox","--js-flags=--max-old-space-size=8192","--no-sandbox","--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36"],"defaultViewport":{"width":1920,"height":1080,"deviceScaleFactor":1},"timeout":15000,"headless":true}
INFO  Statistics: PuppeteerCrawler request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":60037,"retryHistogram":[]}
INFO  PuppeteerCrawler:AutoscaledPool: state {"currentConcurrency":1,"desiredConcurrency":3,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}}
waitFor is deprecated and will be removed in a future release. See https://github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code.
waitFor is deprecated and will be removed in a future release. See https://github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code.
waitFor is deprecated and will be removed in a future release. See https://github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code.
Processing page http://localhost:1616/
Discovered 0 links on page http://localhost:1616/
Found 1 accessibility issues on page http://localhost:1616/
INFO  PuppeteerCrawler: Crawler reached the maxRequestsPerCrawl limit of 1 requests and will shut down soon. Requests that are in progress will be allowed to finish.
INFO  PuppeteerCrawler: Earlier, the crawler reached the maxRequestsPerCrawl limit of 1 requests and all requests that were in progress at that time have now finished. In total, the crawler processed 1 requests and will shut down.
INFO  PuppeteerCrawler: Final request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":62240,"requestsFinishedPerMinute":1,"requestsFailedPerMinute":0,"requestTotalDurationMillis":62240,"requestsTotal":1,"crawlerRuntimeMillis":62480,"requestsFinished":1,"requestsFailed":0,"retryHistogram":[1]}
Generating summary scan report...
Warning: Each child in a list should have a unique "key" prop.

Check the top-level render call using <ul>. See https://fb.me/react-warning-keys for more information.
    in li
    in SummaryReportDetailsSection
    in div
    in main
    in ContentSection
    in body
    in BodySection
    in ReportBody
Summary report was saved as C:/temp/ai-scan-tester/ai_scan_cli_output/index.html

Screenshots

n/a

Additional context

Context about the customer that originally hit this issue can be found in an email thread with our internal support alias CC'd

The text was updated successfully, but these errors were encountered:

ghost · 2022-03-21T17:57:19Z

This issue has been marked as ready for team triage; we will triage it in our weekly review and update the issue. Thank you for contributing to Accessibility Insights!

dbjorge · 2022-03-23T18:22:55Z

In case we want to do something ourselves that's more specific than networkidle0 or networkidle2, a good starting point is probably this gist linked from microsoft/playwright#2515, + a bit of logic to only track requests without Content-Type: text/event-stream - it's written against Playwright, but the implementation should be basically identical for Puppeteer.

lamaks · 2022-04-09T01:00:52Z

Resolved in CLI 1.0.0

dbjorge added the bug label Mar 21, 2022

ghost added the status: new label Mar 21, 2022

ghost assigned lisli1 Mar 21, 2022

dbjorge changed the title ~~[Bug] Scanning pages that use Server-Side Events takes a minute longer than it should~~ [Bug] Scanning pages that use Server-Sent Events takes a minute longer than it should Mar 21, 2022

lisli1 added the status: ready for triage label Mar 21, 2022

ghost removed the status: new label Mar 21, 2022

lamaks assigned lamaks and unassigned lisli1 Mar 22, 2022

lamaks added status: active security Pull requests that address a security vulnerability and removed status: ready for triage security Pull requests that address a security vulnerability labels Mar 22, 2022

lamaks mentioned this issue Mar 22, 2022

Allow page have open network connections on load #2007

Closed

5 tasks

lamaks removed the status: active label Apr 9, 2022

lamaks closed this as completed Apr 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Scanning pages that use Server-Sent Events takes a minute longer than it should #2002

[Bug] Scanning pages that use Server-Sent Events takes a minute longer than it should #2002

dbjorge commented Mar 21, 2022 •

edited

Loading

ghost commented Mar 21, 2022

dbjorge commented Mar 23, 2022 •

edited

Loading

lamaks commented Apr 9, 2022

[Bug] Scanning pages that use Server-Sent Events takes a minute longer than it should #2002

[Bug] Scanning pages that use Server-Sent Events takes a minute longer than it should #2002

Comments

dbjorge commented Mar 21, 2022 • edited Loading

Describe the bug

Area

To Reproduce

Expected behavior

Log files

Screenshots

Additional context

ghost commented Mar 21, 2022

dbjorge commented Mar 23, 2022 • edited Loading

lamaks commented Apr 9, 2022

dbjorge commented Mar 21, 2022 •

edited

Loading

dbjorge commented Mar 23, 2022 •

edited

Loading