Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow creating multiple PDFs in parallel #14

Closed
chadfawcett opened this issue Jun 29, 2017 · 11 comments
Closed

Allow creating multiple PDFs in parallel #14

chadfawcett opened this issue Jun 29, 2017 · 11 comments

Comments

@chadfawcett
Copy link

I receive Error: unexpected server response (500) when attempting to create multiple PDFs in parallel.

I have two questions related to this:

  1. Is this a limitation of Chrome or the html-pdf-chrome library?
  2. Is this the desired behaviour?

Example creating PDFs in parallel:

Results with Error: unexpected server response (500)

const htmlPdf = require('html-pdf-chrome');

const google = htmlPdf.create('http://google.com', { port: 9222 })
const twitter = htmlPdf.create('http://twitter.com', { port: 9222 })

google.then(pdf => pdf.toFile(__dirname + '/google.pdf'))
twitter.then(pdf => pdf.toFile(__dirname + '/twitter.pdf'))

Example creating PDFs in series:

Behaves as expected with two pdf file outputs

const htmlPdf = require('html-pdf-chrome');

htmlPdf.create('http://google.com', { port: 9222 })
  .then(pdf => {
    pdf.toFile(__dirname + '/google.pdf')

    htmlPdf.create('http://twitter.com', { port: 9222 })
      .then(pdf => {
        pdf.toFile(__dirname + '/twitter.pdf')
      })
  })

Note: I can create the PDFs in parallel if I use multiple instances of Chrome, but this is obviously not ideal.

@chadfawcett chadfawcett changed the title Error when attempting to create multiple PDFs at once Error when attempting to create multiple PDFs in Parallel Jun 29, 2017
@westy92
Copy link
Owner

westy92 commented Jun 29, 2017

I believe this is because only one client can connect to Chrome over its exposed port at a time. We should investigate a solution similar to this one: cyrus-and/chrome-remote-interface#92 (comment).

@westy92 westy92 changed the title Error when attempting to create multiple PDFs in Parallel Allow creating multiple PDFs in parallel Jun 29, 2017
@chadfawcett
Copy link
Author

A work around for now is to not specify a port. That way we get a new instance of chrome spawned per request. Not optimal for performance, but I assume if you had several pages to load it would eventually be faster to spawn multiple than to do it in series.

const google = htmlPdf.create('http://google.com')
const twitter = htmlPdf.create('http://twitter.com')

google.then(pdf => pdf.toFile(__dirname + '/google.pdf'))
twitter.then(pdf => pdf.toFile(__dirname + '/twitter.pdf'))

@jafri
Copy link

jafri commented Jul 14, 2017

Potential solution is creating a node-pool and starting google chrome through spawn

@gnat42
Copy link

gnat42 commented Jul 22, 2017

So I recently built a node express app on top of this. I accept a URL or HTML via GET/POST requests, then pass them to this and return the printed pdf. One of the things I was wondering about is what happens when two requests happen simultaneously. I have chrome-beta running headless via a systemd service file to keep it running. Will that cause an error if I get two requests at the same time or at least overlapping enough that it'll interrupt the first one loading etc?

@gnat42
Copy link

gnat42 commented Jul 25, 2017

So in re-writing my node app to do parallel html->pdf I stopped using html-pdf-chrome but got it working. The only part I don't have working is closing tabs. I'm new to node so there may be issues I'm unaware of but here's how I accomplished it.

@jafri
Copy link

jafri commented Jul 25, 2017 via email

@westy92
Copy link
Owner

westy92 commented Jul 25, 2017

@gnat42 there's a fundamental issue with your solution, which I described in my first comment on this thread.

only one client can connect to Chrome over its exposed port at a time.

Your solution tries to connect to a single instance Chrome for each PDF generation with a new client each time. The first request will succeed, since it is the first connection. However, while the first connection is still open, any additional connections will fail.

I only see two paths forward:

I believe the second is the correct way to solve this, which is why I mentioned it. Basically we have one sustained connection to Chrome. As requests come in, we create new tabs in Chrome and add them to a queue once they finish loading. We then cycle through each loaded tab as we're able, generate our PDF, and finally close the tab.

@gnat42
Copy link

gnat42 commented Jul 25, 2017

So I used ab -n 40 -c 5 to test and none failed. Then I used parallel to do 3 different ab tests with different urls. again it all worked. I'm pretty sure my code is doing the second solution isn't it?
line 29?
https://github.com/NobletSolutions/remote-pdf-printer/blob/master/api/controllers/printPdfController.js#L29

I should mention that I kept my original non-async code and ab couldn't do more than one request at a time.

@gnat42
Copy link

gnat42 commented Jul 25, 2017

The last issue I was facing was I wasn't able to properly close the tabs. However in cyrus-and/chrome-remote-interface#214 I got some answers about how to close it. From what I can see this works. For each request I make I get an additional chrome process spawned from the main one that is listening which from what I can tell is one per tab + the first few that are started when I start chrome headlessly. It then closes the process when done. Previously each tab remained and I had a lot of chrome processes hanging around. Since I'm not auto-spawning chrome I'm pretty sure each of these are a tab. In any case I think that perhaps this is the same way this library could be modified to produce the PDFs in parallel. Any other comments/areas/potential bugs would be greatly appreciated since I'm quite new to nodejs.

@rhyek
Copy link

rhyek commented Aug 6, 2017

This should be fixed in #24 where I'm implementing the use of tabs. I haven't tested with pdfs, but I was having the same issue with screenshots and this solved it.

@rhyek
Copy link

rhyek commented Aug 6, 2017

BTW, I had previously read about the need to "bring a tab forward" before taking a screenshot, hence the need for a queue of sorts, but in my tests (albeit limited) this wasn't necessary.

@westy92 westy92 added this to the v1.0.0 milestone Aug 11, 2017
@westy92 westy92 closed this as completed in 89f63ad Nov 6, 2017
@westy92 westy92 modified the milestones: v1.0.0, v0.5.0 Nov 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants