Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error handling #36

Open
p-kuen opened this issue Feb 6, 2021 · 4 comments
Open

Better error handling #36

p-kuen opened this issue Feb 6, 2021 · 4 comments

Comments

@p-kuen
Copy link
Contributor

p-kuen commented Feb 6, 2021

I am using this package in production for several months. Unfortunately it causes some server crashes when converting different files.

For example I get the following error messages (from server logs):

Syntax Warning: Unexpected oc reference target: 2374
Syntax Error (13633): Unexpected end of file in flate stream
Syntax Error: Leftover args in content stream

After this line, the server crashes, although I am using a try/catch clause.
Is there any way to better handle these errors?

OS: Alpine 3.13 running in Docker container
Poppler: 20.12.1

@blackbeam
Copy link
Owner

Hi!

Syntax Warning: Unexpected oc reference target: 2374
Syntax Error (13633): Unexpected end of file in flate stream
Syntax Error: Leftover args in content stream

Errors like this are from the native poppler library. It seems for poppler that your PDF is malformed.

After this line, the server crashes, although I am using a try/catch clause.

This should not happen if wrapped in try/catch. How exactly this crash looks?
If it looks like Uncaught exception form V8 engine, then it's a bug in the poppler-simple
If it looks like Segmentation fault (core dumped), then it may be a bug in the native part of the popper-simple or in the poppler itself.

@p-kuen
Copy link
Contributor Author

p-kuen commented Feb 7, 2021

It took many hours to find out more about this problem.

First, the PDF is not malformed as it was successfully converted with poppler-utils->pdftocairo. I sadly switched from poppler to graphicsmagick now and the conversion process works fine now (although it is slower).

The biggest problem was that there was no error messages or crash logs shown. I am running the server inside a docker container and after these three lines the server just restarts. No exceptuon, crash log or anything.

On the host system inside journalctl I found out that there was a SIGSEGV thrown by the node process.

Another important but strange note is that after switching to graphicsmagick, the conversion process worked, but I still got a few crashes (also SIGSEGV) on other situations. After a few hours I found out, they were caused by using the "sharp" library (which is using some poppler-native bindings as far as I know). After replacing sharp by graphicsmagick, everything works fine again.

I tried all the above on two different servers with two different node versions.

In conclusion it seems like a native poppler issue, but without further logs being thrown it is hard to find out the real cause of the problem.

@blackbeam
Copy link
Owner

In conclusion it seems like a native poppler issue, but without further logs being thrown it is hard to find out the real cause of the problem.

Seems like a job for gdb. If it's possible to create a small reproducible example, then it may be reported to the poppler's issue tracker.

@msageryd
Copy link
Contributor

msageryd commented Apr 7, 2021

As a side note: I don't think you should compare poppler-simple with Cairo. I'm quite sure that PS is not using Cairo. Does your PDF convert successfully with pdftoppm from poppler-utils? I actually don't know, but based on available output formats I'd guess that poppler-simple uses the same underlaying machinery as pdftoppm.

I need to use Cario myself because my PDFs are mostly vector graphics which is best rendered by Cario IMO.
I don't want to use GM because it's very slow, and it still does not do as good as Cairo.

My solution:

  • Use poppler-simple for general pdf information (extract text, check page sizes etc)
  • Call pdftocairo from Node with child_process for the actual conversion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants