Warning
pdfnaut is currently in an early stage of development and has only been tested with a small set of compliant documents. Some non-compliant documents may work under strict=False. Expect bugs or issues.
pdfnaut aims to become a PDF processor for parsing PDF 2.0 files.
Currently, pdfnaut provides a low-level interface for reading and writing PDF objects as defined in the PDF 2.0 specification.
The newer high-level API
from pdfnaut import PdfDocument
pdf = PdfDocument.from_filename("tests/docs/sample.pdf")
first_page = next(pdf.flattened_pages)
if first_page.content_stream:
print(first_page.content_stream.contents)
The more mature low-level API
from pdfnaut import PdfParser
with open("tests/docs/sample.pdf", "rb") as doc:
pdf = PdfParser(doc.read())
pdf.parse()
pages = pdf.trailer["Root"]["Pages"]
first_page_stream = pages["Kids"][0]["Contents"]
print(first_page_stream.decode())