Skip to content

aescarias/pdfnaut

Repository files navigation

pdfnaut

Documentation Status PyPI - License PyPI - Downloads PyPI - Version

Warning

pdfnaut is currently in an early stage of development and has only been tested with a small set of compliant documents. Some non-compliant documents may work under strict=False. Expect bugs or issues.

pdfnaut aims to become a PDF processor for parsing PDF 2.0 files.

Currently, pdfnaut provides a low-level interface for reading and writing PDF objects as defined in the PDF 2.0 specification.

Examples

The newer high-level API

from pdfnaut import PdfDocument

pdf = PdfDocument.from_filename("tests/docs/sample.pdf")
first_page = next(pdf.flattened_pages)

if first_page.content_stream:
    print(first_page.content_stream.contents)

The more mature low-level API

from pdfnaut import PdfParser

with open("tests/docs/sample.pdf", "rb") as doc:
    pdf = PdfParser(doc.read())
    pdf.parse()

    pages = pdf.trailer["Root"]["Pages"]

    first_page_stream = pages["Kids"][0]["Contents"]
    print(first_page_stream.decode())