Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Regression] Eagerly fetch/parse the entire /Pages-tree in corrupt documents (issue 14303, PR 14311 follow-up) #14335

Merged

Conversation

Snuffleupagus
Copy link
Collaborator

@Snuffleupagus Snuffleupagus commented Dec 2, 2021

Please note: This is similar to the method that existed prior to PR #3848, but the new method will only be used as a fallback when parsing of corrupt PDF documents.

The implementation in PR #14311 unfortunately turned out to be way too simplistic, as evident by the recently added test-files in issue #14303, since it may cause infinite loops in PDFDocument.checkLastPage for some corrupt PDF documents.[1]
To avoid this, the easiest solution that I could come up with was to fallback to eagerly parsing the entire /Pages-tree when the /Count-entry validation fails during document initialization.

Fixes at least two of the issues listed in issue #14303, namely the poppler-395-0.pdf... and GHOSTSCRIPT-698804-1.pdf... documents.


[1] The whole point of PR #14311 was obviously to get rid of infinte loops during document initialization, not to introduce any more of those.

@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Dec 2, 2021

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/f9c3702cc3ab21e/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Dec 2, 2021

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/d68cdda6296eae1/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Dec 2, 2021

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/f9c3702cc3ab21e/output.txt

Total script time: 21.54 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: FAILED
  • Regression tests: FAILED
  different ref/snapshot: 7
  different first/second rendering: 2

Image differences available at: http://54.241.84.105:8877/f9c3702cc3ab21e/reftest-analyzer.html#web=eq.log

…cuments (issue 14303, PR 14311 follow-up)

*Please note:* This is similar to the method that existed prior to PR 3848, but the new method will *only* be used as a fallback when parsing of corrupt PDF documents.

The implementation in PR 14311 unfortunately turned out to be *way* too simplistic, as evident by the recently added test-files in issue 14303, since it may *cause* infinite loops in `PDFDocument.checkLastPage` for some corrupt PDF documents.[1]
To avoid this, the easiest solution that I could come up with was to fallback to eagerly parsing the *entire* /Pages-tree when the /Count-entry validation fails during document initialization.

Fixes *at least* two of the issues listed in issue 14303, namely the `poppler-395-0.pdf...` and `GHOSTSCRIPT-698804-1.pdf...` documents.

---
[1] The whole point of PR 14311 was obviously to *get rid of* infinte loops during document initialization, not to introduce any more of those.
@pdfjsbot
Copy link

pdfjsbot commented Dec 2, 2021

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/d68cdda6296eae1/output.txt

Total script time: 42.24 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 10
  different first/second rendering: 1

Image differences available at: http://54.193.163.58:8877/d68cdda6296eae1/reftest-analyzer.html#web=eq.log

@Snuffleupagus Snuffleupagus force-pushed the Catalog-getAllPageDicts branch from 4993502 to 1fac637 Compare December 2, 2021 13:31
@timvandermeij timvandermeij merged commit 4c145fc into mozilla:master Dec 2, 2021
@timvandermeij
Copy link
Contributor

Nice work; thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants