You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The scan engine's page navigator includes functionality to attempt to wait for a page's outstanding network activity to settle before starting to scan it (in that it passes waitUntil: 'networkidle0' to Puppeteer's page.goto). However, when attempting to scan a page which uses Server-Sent Events (indefinitely-lived requests with content-type text/event-stream), the scan engine doesn't understand to special case this type of intentionally indefinite request. It assumes that it is a normal navigation-related request and blocks scanning until its (60s) navigation timeout elapses.
The motivating use case is an internal team attempting to run a scan via the Accessibility Insights for Azure DevOps task against a website in a developer environment based on an Express server with webpack-hot-middleware. This package is one of several HMR implementations which use Server-Sent Events to implement hot module reloading.
As a workaround, we've suggested that the impacted internal team disable HMR in their CI configuration running the Azure DevOps task, but we should still fix this on our end even if that customer is unblocked - running our scan CLI tool against a dev build of a website is a pretty reasonable use case.
Area
Install script
Update script
Scan CLI/Action
[?] Service (I expect the underlying issue would impact the service, too, but haven't verified this. I don't expect a user is very likely to end up running the service against an HMR-enabled page, but it's possible that a site being scanned could be using Server-Sent Events for non-HMR purposes)
To Reproduce
Steps to reproduce the behavior using webpack-hot-middleware's standard example project:
Expected behavior: the single scan that this results in should only take as long as spinning up a puppeteer instance + minimal extra overhead (~5s)
Actual behavior: It instead takes "puppeeteer launch overhead + 60 seconds". Final request statistics: ... "requestAvgFinishedDurationMillis":62240 ...
Log files
Full console output from the repro steps above
❯ npx ai-scan --crawl --url http://localhost:1616 --maxUrls 1
INFO PuppeteerCrawler:AutoscaledPool:Snapshotter: Setting max memory of this run to 3993 MB. Use the APIFY_MEMORY_MBYTES environment variable to override it.
INFO PuppeteerCrawler:AutoscaledPool: state {"currentConcurrency":0,"desiredConcurrency":2,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":null},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":null}}}
INFO Launching Puppeteer {"args":["--disable-dev-shm-usage","--no-sandbox","--disable-setuid-sandbox","--js-flags=--max-old-space-size=8192","--no-sandbox","--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36"],"defaultViewport":{"width":1920,"height":1080,"deviceScaleFactor":1},"timeout":15000,"headless":true}
INFO Statistics: PuppeteerCrawler request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":60037,"retryHistogram":[]}
INFO PuppeteerCrawler:AutoscaledPool: state {"currentConcurrency":1,"desiredConcurrency":3,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}}
waitFor is deprecated and will be removed in a future release. See https://github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code.
waitFor is deprecated and will be removed in a future release. See https://github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code.
waitFor is deprecated and will be removed in a future release. See https://github.com/puppeteer/puppeteer/issues/6214 for details and how to migrate your code.
Processing page http://localhost:1616/
Discovered 0 links on page http://localhost:1616/
Found 1 accessibility issues on page http://localhost:1616/
INFO PuppeteerCrawler: Crawler reached the maxRequestsPerCrawl limit of 1 requests and will shut down soon. Requests that are in progress will be allowed to finish.
INFO PuppeteerCrawler: Earlier, the crawler reached the maxRequestsPerCrawl limit of 1 requests and all requests that were in progress at that time have now finished. In total, the crawler processed 1 requests and will shut down.
INFO PuppeteerCrawler: Final request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":62240,"requestsFinishedPerMinute":1,"requestsFailedPerMinute":0,"requestTotalDurationMillis":62240,"requestsTotal":1,"crawlerRuntimeMillis":62480,"requestsFinished":1,"requestsFailed":0,"retryHistogram":[1]}
Generating summary scan report...
Warning: Each child in a list should have a unique "key" prop.
Check the top-level render call using <ul>. See https://fb.me/react-warning-keys for more information.
in li
in SummaryReportDetailsSection
in div
in main
in ContentSection
in body
in BodySection
in ReportBody
Summary report was saved as C:/temp/ai-scan-tester/ai_scan_cli_output/index.html
Screenshots
n/a
Additional context
Context about the customer that originally hit this issue can be found in an email thread with our internal support alias CC'd
The text was updated successfully, but these errors were encountered:
dbjorge
changed the title
[Bug] Scanning pages that use Server-Side Events takes a minute longer than it should
[Bug] Scanning pages that use Server-Sent Events takes a minute longer than it should
Mar 21, 2022
This issue has been marked as ready for team triage; we will triage it in our weekly review and update the issue. Thank you for contributing to Accessibility Insights!
In case we want to do something ourselves that's more specific than networkidle0 or networkidle2, a good starting point is probably this gist linked from microsoft/playwright#2515, + a bit of logic to only track requests without Content-Type: text/event-stream - it's written against Playwright, but the implementation should be basically identical for Puppeteer.
Describe the bug
The scan engine's page navigator includes functionality to attempt to wait for a page's outstanding network activity to settle before starting to scan it (in that it passes
waitUntil: 'networkidle0'
to Puppeteer'spage.goto
). However, when attempting to scan a page which uses Server-Sent Events (indefinitely-lived requests with content-typetext/event-stream
), the scan engine doesn't understand to special case this type of intentionally indefinite request. It assumes that it is a normal navigation-related request and blocks scanning until its (60s) navigation timeout elapses.The motivating use case is an internal team attempting to run a scan via the Accessibility Insights for Azure DevOps task against a website in a developer environment based on an Express server with webpack-hot-middleware. This package is one of several HMR implementations which use Server-Sent Events to implement hot module reloading.
As a workaround, we've suggested that the impacted internal team disable HMR in their CI configuration running the Azure DevOps task, but we should still fix this on our end even if that customer is unblocked - running our scan CLI tool against a dev build of a website is a pretty reasonable use case.
Area
To Reproduce
Steps to reproduce the behavior using
webpack-hot-middleware
's standard example project:git clone https://github.com/webpack-contrib/webpack-hot-middleware.git -b v2.25.1
cd webpack-hot-middleware && npm clean-install
cd example && npm clean-install
npm start
mkdir ai-scan-tester && cd ai-scan-tester
npm install [email protected]
npx ai-scan --crawl --url http://localhost:1616 --maxUrls 1
Expected behavior
Expected behavior: the single scan that this results in should only take as long as spinning up a puppeteer instance + minimal extra overhead (~5s)
Actual behavior: It instead takes "puppeeteer launch overhead + 60 seconds".
Final request statistics: ... "requestAvgFinishedDurationMillis":62240 ...
Log files
Full console output from the repro steps above
Screenshots
n/a
Additional context
Context about the customer that originally hit this issue can be found in an email thread with our internal support alias CC'd
The text was updated successfully, but these errors were encountered: