-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use pool of tabs #92
Comments
It seems that you're right, the tab must be exposed during the screenshot phase. As you say, activating the target on page load won't work probably because another page load event steals the focus before the screenshot is completed thus producing empty/partial images. Luckily though it's the page load phase which is usually time consuming and that part can be fully parallelized by spawning multiple tabs/targets. Try this, instead of calling Here's what I mean: const fs = require('fs');
const CDP = require('chrome-remote-interface');
function loadForScrot(url) {
return new Promise(async (fulfill, reject) => {
const tab = await CDP.New();
const client = await CDP({tab});
const {Page} = client;
Page.loadEventFired(() => {
fulfill({client, tab});
});
await Page.enable();
await Page.navigate({url});
});
}
async function process(urls) {
try {
const handlers = await Promise.all(urls.map(loadForScrot));
for (const {client, tab} of handlers) {
const {Page} = client;
await CDP.Activate({id: tab.id});
const filename = `/tmp/scrot_${tab.id}.png`;
const result = await Page.captureScreenshot();
const image = Buffer.from(result.data, 'base64');
fs.writeFileSync(filename, image);
console.log(filename);
await client.close();
}
} catch (err) {
console.error(err);
}
}
process(['http://example.com',
'http://example.com',
'http://example.com',
'http://example.com',
'http://example.com',
'http://example.com',
'http://example.com',
'http://example.com']); Please let me know if this can work for you. |
@cyrus-and Thanks for taking time to look into this issue. Yes the above code should work and I get what you are saying, but I was planning something like an api which gets screenshots/pdf based on the url. When taking screenshots serially in batches is the only way it would block the other requests from loading the pages in the mean time. Generating PDF was my use case and Yesterday I tried the latest headless for pdf generation using printToPDF api and it doesn't require the tab to be activated or exposed. Thanks once again for this awesome project ! |
OK great. You're welcome! :) |
@anandanand84: by any chance do you know the method to close all tabs at the beginning of the process? I'm using docker for chrome headless and it seems if you prematurely cancel your script, the tab remains opened, and the next time you connect, you connect to the tab you were on before. |
@pthieu I used |
Closing all tabs: async function cleanup() {
console.log('Closing existing targets if any');
const targets = await CDP.List();
return Promise.all(targets.map(({id}) => CDP.Close({id})));
}
async function run() {
await cleanup();
}
run(); |
@vvo do you know if Seems like windows support for creating a new tab in headless mode is not supported? #96 <- are you on linux? Nevermind. I was using the Not sure what's up, both are advertised to be using tip of tree for chromium. Only difference is that the former is using |
I don't think Chrome distinguishes between the two, and you shouldn't care IMO.
There are known issues about tabs managements in headless mode. If you use version 59 you should be OK. |
I would like to accomplish the same goal as @anandanand84 but with images. Is there any way I can make chrome take two screenshots at the same time? My other options are
|
@utkuturunc this solution should work if you can live with the fact that only the page loading phase happens in parallel while screenshots are taken sequentially once that all the URLs are loaded. |
I am not sure on how to manage the url array. I am trying to write a script that takes html from stdin and pipe the rendered image to stdout. (Which I then stream to the user) The script should load the html immediately then return the image when it is the client's turn. The current idea is: Also to limit the number of opened tabs, I may need a pooling implementation. This looks like an overly complicated architecture. (especially comparing to my current solution => start a new phantomjs instance every time) Edit: |
@utkuturunc with the proper design it should be possible to do that fairly easily, the complication arises from the fact that you have to babysit the screenshotting phase due to a limitation of Chrome which I think (hope!) is just temporary.
Luckily enough you don't have to care about race conditions in Node.js, as long as everything stays in the same synchronous block. Of course your script would actually be a server which would act as a broker for client requests.
You can always start a new Chrome instance every time on a different port but that's probably not a great idea. |
So, funny story... I was not using headless mode since it was producing blank images in macOS. Turns out when I use chrome v60 in a headless docker, there is no issue at all. |
@utkuturunc well I don't know about the OP but I can only reproduce this issue in non-headless mode. Both version 59 and 60 work fine in headless mode on Linux. |
This is more of a how to question than a issue. Lets assume the scenario of generating screenshots of webpages concurrently. My thought was to create a pool of tabs like below and utilize the workers to get screenshots.
This creates 3 tabs in chrome and in my request handler
The problem is
await worker.Page.captureScreenshot();
never resolves because the tab is not active. Just click on the tab containing the url in chrome it just resolves.This can be workaround by calling
After doing this I just get the blank screen as image unless I put a pause which gives the actual image. The bottom line is whatever I do there is no way to process multiple images simultaneously because only the active tab can process the image, So how do we use multiple tabs and process images in all the tabs concurrently. Is this a bug in chrome which does not allow to capture screenshots when the tab is not active or I am missing something.
Let me know if I am not clear.
The text was updated successfully, but these errors were encountered: