-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Support for Adding and Viewing Ink Annotations in Mac's Preview app #2332
Comments
To complete the analysis: @themarisolhernandez, I would expect the same results with the same softwares under Mac. Can you confirm ? |
your code can not be run as it is. Can you complete it. Thanks |
Comment by themarisolhernandez:
That is interesting and unexpected. |
@themarisolhernandez, can you provide also the output when using PyPDF2 |
@pubpub-zz @MartinThoma Here is the complete code using pypdf from pypdf.generic import NullObject, IndirectObject, ArrayObject, DictionaryObject
from pypdf import PdfReader, PdfWriter
from typing import Any
from io import BytesIO
def extract_annots_recursively(extracted_annots: list,
page_num: int,
annots: Any) -> None:
if annots is None or isinstance(annots, NullObject):
# Skip NullObjects
return
elif isinstance(annots, IndirectObject):
obj = annots.get_object()
extract_annots_recursively(extracted_annots=extracted_annots,
page_num=page_num,
annots=obj)
elif isinstance(annots, list) or isinstance(annots, ArrayObject):
for obj in annots:
extract_annots_recursively(extracted_annots=extracted_annots,
page_num=page_num,
annots=obj)
elif isinstance(annots, dict) or isinstance(annots, DictionaryObject):
extracted_annots.append([page_num, annots])
def extract_annots(file_bytes: bytes) -> tuple[bytes, list]:
writer = PdfWriter()
extracted_annots = []
with BytesIO(file_bytes) as input_stream, BytesIO() as output_stream:
reader = PdfReader(input_stream)
for page_num, page in enumerate(reader.pages):
page_annots = page.get("/Annots", [])
extract_annots_recursively(extracted_annots=extracted_annots,
page_num=page_num,
annots=page_annots)
writer.add_page(page)
# Remove annots from the PdfWriter
writer.remove_annotations(subtypes=None)
writer.write(output_stream)
output_stream.seek(0)
pdf_file = output_stream.read()
return pdf_file, extracted_annots
def add_annots(file_bytes: bytes,
annots: list) -> bytes:
writer = PdfWriter()
with BytesIO(file_bytes) as input_stream, BytesIO() as output_stream:
reader = PdfReader(input_stream)
writer.append_pages_from_reader(reader)
for page_num, annot in annots:
writer.add_annotation(page_number=page_num, annotation=annot)
# Add original metadata
writer.add_metadata(reader.metadata)
writer.write(output_stream)
output_stream.seek(0)
pdf_file = output_stream.read()
return pdf_file
if __name__ == "__main__":
print("--- Extract Annots ---")
with open("extract_annots__input_file.pdf", "rb") as f:
input_file = f.read()
output_file, annots = extract_annots(file_bytes=input_file)
print("\n--- Add Annots ---")
output_file = add_annots(file_bytes=output_file,
annots=annots)
with open("add_annots__output_file.pdf", "wb") as f:
f.write(output_file) The input and output files are attached: Again, the Ink annotation appears visible when opening the output file in a PDF viewer like Adobe Acrobat Reader. But the Ink annotation does not appear visible when opening the output file in Mac's Preview app. As seen in the screenshot, the Ink annotation is there but it is transparent for some reason. I will send another response with the output of PyPDF2. |
Here are the results of using PyPDF2 instead, from PyPDF2.generic import NullObject, IndirectObject, ArrayObject, DictionaryObject
from PyPDF2 import PdfReader, PdfWriter
from typing import Any
from io import BytesIO
def extract_annots_recursively(extracted_annots: list,
page_num: int,
annots: Any) -> None:
if annots is None or isinstance(annots, NullObject):
# Skip NullObjects
return
elif isinstance(annots, IndirectObject):
obj = annots.get_object()
extract_annots_recursively(extracted_annots=extracted_annots,
page_num=page_num,
annots=obj)
elif isinstance(annots, list) or isinstance(annots, ArrayObject):
for obj in annots:
extract_annots_recursively(extracted_annots=extracted_annots,
page_num=page_num,
annots=obj)
elif isinstance(annots, dict) or isinstance(annots, DictionaryObject):
extracted_annots.append([page_num, annots])
def extract_annots(file_bytes: bytes) -> tuple[bytes, list]:
writer = PdfWriter()
extracted_annots = []
# Note: input_stream is not closed explicitly because it leads to an I/O error for IndirectObjects
input_stream = BytesIO(file_bytes)
reader = PdfReader(input_stream)
for page_num, page in enumerate(reader.pages):
page_annots = page.get("/Annots", [])
extract_annots_recursively(extracted_annots=extracted_annots,
page_num=page_num,
annots=page_annots)
writer.add_page(page)
# Remove annots from the PdfWriter
writer.remove_links()
with BytesIO() as output_stream:
writer.write(output_stream)
output_stream.seek(0)
cleaned_pdf = output_stream.read()
return cleaned_pdf, extracted_annots
def add_annots(file_bytes: bytes,
annots: list) -> bytes:
writer = PdfWriter()
with BytesIO(file_bytes) as input_stream, BytesIO() as output_stream:
reader = PdfReader(input_stream)
writer.append_pages_from_reader(reader)
for page_num, annot in annots:
writer.add_annotation(page_number=page_num, annotation=annot)
# Add original metadata
writer.add_metadata(reader.metadata)
writer.write(output_stream)
output_stream.seek(0)
pdf_file = output_stream.read()
return pdf_file
if __name__ == "__main__":
print("--- Extract Annots ---")
with open("extract_annots__input_file.pdf", "rb") as f:
input_file = f.read()
output_file, annots = extract_annots(file_bytes=input_file)
print("\n--- Add Annots ---")
output_file = add_annots(file_bytes=output_file,
annots=annots)
with open("add_annots__output_file_pypdf2.pdf", "wb") as f:
f.write(output_file) The input and output files are attached: The following is a screenshot of the output file opened in Mac's preview app. Here you can clearly see the Ink annotation. |
@themarisolhernandez |
any return about this ? |
I close this issue as there is no news.feel free to send update if you want to reopen it |
I am having problems adding Ink annotations back to a PDF using PdfWriter.add_annotation(). I think the problem is related to the PDF viewer. When I open the file after adding the Ink annotation in Mac's preview app, the Ink annotation is transparent. The file and annotation look fine when viewing it under AdobeReader.
Any ideas why?
This isn't an issue when using PyPDF2, but I would prefer to use pypdf. Are there plans to support this?
Environment
Which environment were you using when you encountered the problem?
Code + PDF
This is a minimal, complete example that shows the issue:
The input file is attached under the filename input.pdf. The output file is attached under the filename output.pdf. An image of the output file is also attached to show that the Ink annotation is transparent.
The
annot
input looks like:input.pdf
output.pdf
The text was updated successfully, but these errors were encountered: