Skip to content
This repository has been archived by the owner on Jan 6, 2025. It is now read-only.

Cannot use on password protected files #162

Closed
heroic opened this issue Oct 20, 2018 · 4 comments
Closed

Cannot use on password protected files #162

heroic opened this issue Oct 20, 2018 · 4 comments

Comments

@heroic
Copy link

heroic commented Oct 20, 2018

Most of the files that we have, have a password. PdfFileReader does support decrypting the file, but without an option to pass the password in read_pdf it cannot be done

@rbares
Copy link
Contributor

rbares commented Oct 21, 2018

There are two issues with this:

I have a potential pull request for camelot to add the mediocre level of support offered by PyPDF2, but am uncertain whether that approach is desirable.

@vinayak-mehta: given the lack of maintenance have you considered moving to an alternative PDF library? PikePDF looks promising but would require backporting for python 2.7

@vinayak-mehta
Copy link
Contributor

@rbares Thanks for the detailed comment! I've faced issues with PyPDF2 decryption in the past (I remember the same problem of it not being able to support some encryption types). I've found qpdf for those cases. I see that PikePDF is based on qpdf, so I could look into it later. But having a password kwarg in read_pdf for PyPDF2's limited encryption support sounds like a good idea, till we move on to something better. You can go ahead and open the PR!

@heroic Maybe you can add a preprocessing step that decrypts all your PDFs at once with qpdf and a wildcard (*.pdf), before extracting tables using Camelot?

rbares added a commit to rbares/camelot that referenced this issue Oct 27, 2018
Update API and CLI to accept ASCII passwords to decrypt PDFs
encrypted by algorithm code 1 or 2 (limited by support from PyPDF2).
Update documentation and unit tests accordingly.

Example document health_protected.pdf generated as follows:
qpdf --encrypt userpass ownerpass 128 -- health.pdf health_protected.pdf

Issue atlanhq#162
rbares added a commit to rbares/camelot that referenced this issue Oct 27, 2018
rbares added a commit to rbares/camelot that referenced this issue Oct 28, 2018
Explicitly check passwords for None rather than falsey.
Correct read_pdf documentation for Owner/User password.

Issue atlanhq#162
rbares added a commit to rbares/camelot that referenced this issue Oct 28, 2018
rbares added a commit to rbares/camelot that referenced this issue Oct 28, 2018
vinayak-mehta pushed a commit that referenced this issue Oct 28, 2018
* [MRG] Add basic support for encrypted PDF files

Update API and CLI to accept ASCII passwords to decrypt PDFs
encrypted by algorithm code 1 or 2 (limited by support from PyPDF2).
Update documentation and unit tests accordingly.

Example document health_protected.pdf generated as follows:
qpdf --encrypt userpass ownerpass 128 -- health.pdf health_protected.pdf

Issue #162

* Support encrypted PDF files in python3

Issue #162

* Address review comments

Explicitly check passwords for None rather than falsey.
Correct read_pdf documentation for Owner/User password.

Issue #162

* Correct API documentation changes for consistency

Issue #162

* Move error tests from test_common to test_errors

Issue #162

* Add qpdf example

* Remove password is not None check

* Fix merge conflict

* Fix pages example
@Fabian1337
Copy link

any update on this ?
i am not able to move with my system to qpdf so i am looking for some python based solution :/

@vinayak-mehta
Copy link
Contributor

@Fabian1337 Does the the password kwarg introduced in 429640f not work for you? Which system are you on? Is it possible to upload the PDF that you're trying to parse?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants