-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Regression] Eagerly fetch/parse the entire /Pages-tree in corrupt documents (issue 14303, PR 14311 follow-up) #14335
[Regression] Eagerly fetch/parse the entire /Pages-tree in corrupt documents (issue 14303, PR 14311 follow-up) #14335
Conversation
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/f9c3702cc3ab21e/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/d68cdda6296eae1/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/f9c3702cc3ab21e/output.txt Total script time: 21.54 mins
Image differences available at: http://54.241.84.105:8877/f9c3702cc3ab21e/reftest-analyzer.html#web=eq.log |
…cuments (issue 14303, PR 14311 follow-up) *Please note:* This is similar to the method that existed prior to PR 3848, but the new method will *only* be used as a fallback when parsing of corrupt PDF documents. The implementation in PR 14311 unfortunately turned out to be *way* too simplistic, as evident by the recently added test-files in issue 14303, since it may *cause* infinite loops in `PDFDocument.checkLastPage` for some corrupt PDF documents.[1] To avoid this, the easiest solution that I could come up with was to fallback to eagerly parsing the *entire* /Pages-tree when the /Count-entry validation fails during document initialization. Fixes *at least* two of the issues listed in issue 14303, namely the `poppler-395-0.pdf...` and `GHOSTSCRIPT-698804-1.pdf...` documents. --- [1] The whole point of PR 14311 was obviously to *get rid of* infinte loops during document initialization, not to introduce any more of those.
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/d68cdda6296eae1/output.txt Total script time: 42.24 mins
Image differences available at: http://54.193.163.58:8877/d68cdda6296eae1/reftest-analyzer.html#web=eq.log |
4993502
to
1fac637
Compare
Nice work; thanks! |
Please note: This is similar to the method that existed prior to PR #3848, but the new method will only be used as a fallback when parsing of corrupt PDF documents.
The implementation in PR #14311 unfortunately turned out to be way too simplistic, as evident by the recently added test-files in issue #14303, since it may cause infinite loops in
PDFDocument.checkLastPage
for some corrupt PDF documents.[1]To avoid this, the easiest solution that I could come up with was to fallback to eagerly parsing the entire /Pages-tree when the /Count-entry validation fails during document initialization.
Fixes at least two of the issues listed in issue #14303, namely the
poppler-395-0.pdf...
andGHOSTSCRIPT-698804-1.pdf...
documents.[1] The whole point of PR #14311 was obviously to get rid of infinte loops during document initialization, not to introduce any more of those.