Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Scanning pages that use Server-Sent Events takes a minute longer than it should #2002

Closed
1 of 3 tasks
dbjorge opened this issue Mar 21, 2022 · 3 comments
Closed
1 of 3 tasks
Assignees
Labels

Comments

@dbjorge
Copy link
Contributor

dbjorge commented Mar 21, 2022

Describe the bug

The scan engine's page navigator includes functionality to attempt to wait for a page's outstanding network activity to settle before starting to scan it (in that it passes waitUntil: 'networkidle0' to Puppeteer's page.goto). However, when attempting to scan a page which uses Server-Sent Events (indefinitely-lived requests with content-type text/event-stream), the scan engine doesn't understand to special case this type of intentionally indefinite request. It assumes that it is a normal navigation-related request and blocks scanning until its (60s) navigation timeout elapses.

The motivating use case is an internal team attempting to run a scan via the Accessibility Insights for Azure DevOps task against a website in a developer environment based on an Express server with webpack-hot-middleware. This package is one of several HMR implementations which use Server-Sent Events to implement hot module reloading.

As a workaround, we've suggested that the impacted internal team disable HMR in their CI configuration running the Azure DevOps task, but we should still fix this on our end even if that customer is unblocked - running our scan CLI tool against a dev build of a website is a pretty reasonable use case.

Area

  • Install script
  • Update script
  • Scan CLI/Action
  • [?] Service (I expect the underlying issue would impact the service, too, but haven't verified this. I don't expect a user is very likely to end up running the service against an HMR-enabled page, but it's possible that a site being scanned could be using Server-Sent Events for non-HMR purposes)

To Reproduce

Steps to reproduce the behavior using webpack-hot-middleware's standard example project:

  1. git clone https://github.com/webpack-contrib/webpack-hot-middleware.git -b v2.25.1
  2. cd webpack-hot-middleware && npm clean-install
  3. cd example && npm clean-install
  4. npm start
  5. (separate prompt) mkdir ai-scan-tester && cd ai-scan-tester
  6. npm install [email protected]
  7. npx ai-scan --crawl --url http://localhost:1616 --maxUrls 1

Expected behavior

Expected behavior: the single scan that this results in should only take as long as spinning up a puppeteer instance + minimal extra overhead (~5s)

Actual behavior: It instead takes "puppeeteer launch overhead + 60 seconds". Final request statistics: ... "requestAvgFinishedDurationMillis":62240 ...

Log files

Full console output from the repro steps above
❯ npx ai-scan --crawl --url http://localhost:1616 --maxUrls 1
INFO  PuppeteerCrawler:AutoscaledPool:Snapshotter: Setting max memory of this run to 3993 MB. Use the APIFY_MEMORY_MBYTES environment variable to override it.
INFO  PuppeteerCrawler:AutoscaledPool: state {"currentConcurrency":0,"desiredConcurrency":2,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":null},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":null}}}
INFO  Launching Puppeteer {"args":["--disable-dev-shm-usage","--no-sandbox","--disable-setuid-sandbox","--js-flags=--max-old-space-size=8192","--no-sandbox","--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36"],"defaultViewport":{"width":1920,"height":1080,"deviceScaleFactor":1},"timeout":15000,"headless":true}
INFO  Statistics: PuppeteerCrawler request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":60037,"retryHistogram":[]}
INFO  PuppeteerCrawler:AutoscaledPool: state {"currentConcurrency":1,"desiredConcurrency":3,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}}
waitFor is deprecated and will be removed in a future release. See https://github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code.
waitFor is deprecated and will be removed in a future release. See https://github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code.
waitFor is deprecated and will be removed in a future release. See https://github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code.
Processing page http://localhost:1616/
Discovered 0 links on page http://localhost:1616/
Found 1 accessibility issues on page http://localhost:1616/
INFO  PuppeteerCrawler: Crawler reached the maxRequestsPerCrawl limit of 1 requests and will shut down soon. Requests that are in progress will be allowed to finish.
INFO  PuppeteerCrawler: Earlier, the crawler reached the maxRequestsPerCrawl limit of 1 requests and all requests that were in progress at that time have now finished. In total, the crawler processed 1 requests and will shut down.
INFO  PuppeteerCrawler: Final request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":62240,"requestsFinishedPerMinute":1,"requestsFailedPerMinute":0,"requestTotalDurationMillis":62240,"requestsTotal":1,"crawlerRuntimeMillis":62480,"requestsFinished":1,"requestsFailed":0,"retryHistogram":[1]}
Generating summary scan report...
Warning: Each child in a list should have a unique "key" prop.

Check the top-level render call using <ul>. See https://fb.me/react-warning-keys for more information.
    in li
    in SummaryReportDetailsSection
    in div
    in main
    in ContentSection
    in body
    in BodySection
    in ReportBody
Summary report was saved as C:/temp/ai-scan-tester/ai_scan_cli_output/index.html

Screenshots

n/a

Additional context

Context about the customer that originally hit this issue can be found in an email thread with our internal support alias CC'd

@dbjorge dbjorge added the bug label Mar 21, 2022
@ghost ghost added the status: new label Mar 21, 2022
@ghost ghost assigned lisli1 Mar 21, 2022
@dbjorge dbjorge changed the title [Bug] Scanning pages that use Server-Side Events takes a minute longer than it should [Bug] Scanning pages that use Server-Sent Events takes a minute longer than it should Mar 21, 2022
@ghost
Copy link

ghost commented Mar 21, 2022

This issue has been marked as ready for team triage; we will triage it in our weekly review and update the issue. Thank you for contributing to Accessibility Insights!

@ghost ghost removed the status: new label Mar 21, 2022
@lamaks lamaks assigned lamaks and unassigned lisli1 Mar 22, 2022
@lamaks lamaks added status: active security Pull requests that address a security vulnerability and removed status: ready for triage security Pull requests that address a security vulnerability labels Mar 22, 2022
@dbjorge
Copy link
Contributor Author

dbjorge commented Mar 23, 2022

In case we want to do something ourselves that's more specific than networkidle0 or networkidle2, a good starting point is probably this gist linked from microsoft/playwright#2515, + a bit of logic to only track requests without Content-Type: text/event-stream - it's written against Playwright, but the implementation should be basically identical for Puppeteer.

@lamaks
Copy link
Contributor

lamaks commented Apr 9, 2022

Resolved in CLI 1.0.0

@lamaks lamaks closed this as completed Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants