Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown GhostScript error after successful camelot installation #234

Open
MughilM opened this issue Apr 5, 2021 · 12 comments
Open

Unknown GhostScript error after successful camelot installation #234

MughilM opened this issue Apr 5, 2021 · 12 comments

Comments

@MughilM
Copy link

MughilM commented Apr 5, 2021

Hello,
I am currently on an Amazon EC2 Linux machine and have installed camelot through Anaconda with conda install -c conda-forge camelot-py. The installation happened without any issues. I could see Ghostscript as part of the dependencies being installed through Anaconda.

Afterwards, I attempted to extract the table from the example foo.pdf. From the documentation, this should be a simple tables = camelot.read_pdf('foo.pdf'). However, immediately after running that command, I received the following long error.

---------------------------------------------------------------------------
GhostscriptError                          Traceback (most recent call last)
<ipython-input-8-6d588ec94ca5> in <module>
----> 1 tables = camelot.read_pdf('./PDFs/foo.pdf')

/usr/local/.../lib/python3.8/site-packages/camelot/io.py in read_pdf(filepath, pages, password, flavor, suppress_stdout, layout_kwargs, **kwargs)
    111         p = PDFHandler(filepath, pages=pages, password=password)
    112         kwargs = remove_extra(kwargs, flavor=flavor)
--> 113         tables = p.parse(
    114             flavor=flavor,
    115             suppress_stdout=suppress_stdout,

/usr/local/.../lib/python3.8/site-packages/camelot/handlers.py in parse(self, flavor, suppress_stdout, layout_kwargs, **kwargs)
    169             parser = Lattice(**kwargs) if flavor == "lattice" else Stream(**kwargs)
    170             for p in pages:
--> 171                 t = parser.extract_tables(
    172                     p, suppress_stdout=suppress_stdout, layout_kwargs=layout_kwargs
    173                 )

/usr/local/.../lib/python3.8/site-packages/camelot/parsers/lattice.py in extract_tables(self, filename, suppress_stdout, layout_kwargs)
    400             return []
    401 
--> 402         self._generate_image()
    403         self._generate_table_bbox()
    404 

/usr/local/.../lib/python3.8/site-packages/camelot/parsers/lattice.py in _generate_image(self)
    217         gs_call = gs_call.encode().split()
    218         null = open(os.devnull, "wb")
--> 219         with Ghostscript(*gs_call, stdout=null) as gs:
    220             pass
    221         null.close()

/usr/local/.../lib/python3.8/site-packages/camelot/ext/ghostscript/__init__.py in Ghostscript(*args, **kwargs)
     88     if __instance__ is None:
     89         __instance__ = gs.new_instance()
---> 90     return __Ghostscript(
     91         __instance__,
     92         args,

/usr/local/.../lib/python3.8/site-packages/camelot/ext/ghostscript/__init__.py in __init__(self, instance, args, stdin, stdout, stderr)
     37         if stdin or stdout or stderr:
     38             self.set_stdio(stdin, stdout, stderr)
---> 39         rc = gs.init_with_args(instance, args)
     40         self._initialized = True
     41         if rc == gs.e_Quit:

/usr/local/.../lib/python3.8/site-packages/camelot/ext/ghostscript/_gsprint.py in init_with_args(instance, argv)
    172     rc = libgs.gsapi_init_with_args(instance, len(argv), c_argv)
    173     if rc not in (0, e_Quit, e_Info):
--> 174         raise GhostscriptError(rc)
    175     return rc
    176 

GhostscriptError: -770376232

That number at the end appears to be change every time I run the command. It stays in that general area of -700 million. The error was when I was in a Jupyter Notebook. Running this while on the pure command line simply prints out a Segmentation Fault. Downgrading the Python version from 3.8 to 3.6 did not fix the issue.

I tried to see if this was a GhostScript problem, but running

gs -sDEVICE=txtwrite -o extractedText.txt ./PDFs/foo.pdf

worked as intended and I could see the text document all nicely formatted. I am unsure as to what the problem could be at this point. Any help is appreciated.

Thanks!

@bmorton1
Copy link

I've seen this issue as well. I've been able to fix the issue by using ghostscript-9.26 installed using apt-get on ubuntu. The 9.53 version seems to be causing the issue with seg faults.

@JBBalling
Copy link

Having the same issue as stated above, gs-version is 9.26 installed using apt-get on ubuntu.

@jimhall
Copy link
Contributor

jimhall commented May 3, 2021

Hi All,

I hit something similar to this issue #193. The stack trace seems to be pointing to not finding libgs.

Have you confirmed that libgs is installed per the docs? See the Camelot docs on how to confirm a working Ghostscript install.

Anaconda did not address this until this issue was addressed in Nov 2020, so perhaps the version you are running has not addressed the partial ghostscript build bug in Anaconda?

If using apt, consider searching / installing package apt install libgs9

@jeanmonet
Copy link

jeanmonet commented May 11, 2021

libgs9

Not sure, I'm getting the ghostscript error (also installed via conda-forge), even though libgs9 is already the newest version (9.50~dfsg-5ubuntu4.2).

conda install camelot-py "ghostscript<9.52"
# Installed ghostscript 9.22

This doesn't cause the errors any longer.

Is this a problem with version 9.5+ of ghostscript, or is it something Camelot tries to do with Ghostscript that has changed?

@jimhall
Copy link
Contributor

jimhall commented May 12, 2021

Just to confirm can you run the following in your environment:

Python 3.8.5 (default, Sep  4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ctypes.util import find_library
>>> find_library("gs")
'libgs.so.9'
>>>

Do you get similar output or not? If you get a return as above, then you don't have the problem I am describing - it is something new I am not aware of.

@jeanmonet
Copy link

I just tried updating to ghostscript 9.54.0 h9c3ff4c_0 conda-forge/linux-64.
Back to getting errors. Jupyter kernel literally fails and restarts when I try to load a PDF with Camelot.

Python 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:02:46)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.23.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ctypes.util import find_library

In [2]: find_library("gs")
Out[2]: '/opt/miniconda3/envs/myenv/lib/libgs.so.9'

It seems that ghostscript starting with version 9.2x or later starts causing these errors in Camelot. Might it be a breaking change that was introduced in ghostscript and Camelot still tries to use some no-longuer functioning api?

@jimhall
Copy link
Contributor

jimhall commented May 12, 2021

Yes - you may want to use the python gs module and simply open the PDF without using Camelot to isolate the issue a little more. Unfortunately I have limited expertise here.

@jeanmonet
Copy link

Yes - you may want to use the python gs module and simply open the PDF without using Camelot to isolate the issue a little more. Unfortunately I have limited expertise here.

Thanks, well hoping someone from the Camelot team will take a look here to fix the issue.

@jeanmonet
Copy link

UPDATE on issue (still persisting with ghostscript 9.54.0):

Camelot breaks (Jupyter kernel has to literraly restart) when trying to read PDF with lattice flavor. However, it seems to work with stream flavor.

@vinayak-mehta
Copy link
Member

Yep only the lattice flavor uses ghostscript. I'll have to figure out a way to reproduce this issue.

@vinayak-mehta
Copy link
Member

Meanwhile, can you try installing the latest version with pip install "camelot-py[base]==0.10.1" and then trying out the poppler image conversion backend? Here's a snippet:

import camelot
tables = camelot.read_pdf("https://camelot-py.readthedocs.io/en/master/_static/pdf/foo.pdf", backend="poppler")
tables[0]
# <Table shape=(7, 7)>

More info in the docs here: https://camelot-py.readthedocs.io/en/master/user/advanced.html#use-alternate-image-conversion-backends

@bwhitearg
Copy link

was running into this same issue with ghostscript 9.50, lattice flavour would immediately crash python or jupyter with:

“../path/to/file/" terminated by signal SIGSEGV (Address boundary error)

changing to

camelot.read_pdf(.... backend="poppler")

stopped this and I managed to parse my PDFs with no issues thus far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants