Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request all data, rather than throwing, when encountering general errors in ObjectLoader._walk (issue 9462, PR 3289 follow-up) #12965

Merged
merged 1 commit into from
Feb 6, 2021

Conversation

Snuffleupagus
Copy link
Collaborator

@Snuffleupagus Snuffleupagus commented Feb 6, 2021

As far as I can tell, this has been broken ever since PR #3289 (back in 2013) without anyone noticing.

For any non-MissingDataException errors encountered in ObjectLoader._walk, we're simply throwing immediately which thus has the potential to completely break rendering of an entire page.
In practice this is obviously only an issue for PDF documents which are in one way or another corrupt, since that's the only way that XRef.fetch will throw non-MissingDataException errors. To make matters worse these errors are intermittent, since they can only occur if the document is still loading when the ObjectLoader-code runs (note the early return in ObjectLoader.load).

Please note that we cannot simply catch the error and let "normal" parsing continue in ObjectLoader._walk, since that could lead to errors elsewhere given that resources "below" the current one (in the graph) might not be checked as intended then.
All-in-all, the only way to make absolutely sure that we won't cause unexpected MissingDataExceptions somewhere else in the code-base is to fallback to fetching the entire document in this edge-case.


While debugging and fixing this, I went through every stage of http://plasmasturm.org/log/6debug/

…ors in `ObjectLoader._walk` (issue 9462, PR 3289 follow-up)

*As far as I can tell, this has been broken ever since PR 3289 (back in 2013) without anyone noticing.*

For any non-`MissingDataException` errors encountered in `ObjectLoader._walk`, we're simply throwing immediately which thus has the potential to *completely* break rendering of an entire page.
In practice this is obviously only an issue for PDF documents which are in one way or another corrupt, since that's the only way that `XRef.fetch` will throw non-`MissingDataException` errors. To make matters worse these errors are *intermittent*, since they can only occur if the document is still loading when the `ObjectLoader`-code runs (note the early return in `ObjectLoader.load`).

Please note that we cannot simply catch the error and let "normal" parsing continue in `ObjectLoader._walk`, since that could lead to errors elsewhere given that resources "below" the current one (in the graph) might not be checked as intended then.
All-in-all, the only way to make absolutely sure that we won't cause *unexpected* `MissingDataException`s somewhere else in the code-base is to fallback to fetching the *entire* document in this edge-case.
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://3.101.106.178:8877/815a7e8da8e5100/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/f8507ed32465d34/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/f8507ed32465d34/output.txt

Total script time: 4.62 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Windows)


Success

Full output at http://3.101.106.178:8877/815a7e8da8e5100/output.txt

Total script time: 5.69 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: Passed

@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/f2b8a6dfc5e68c9/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://3.101.106.178:8877/ec94063a0568570/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/f2b8a6dfc5e68c9/output.txt

Total script time: 4.60 mins

  • Font tests: Passed
  • Unit tests: FAILED
  • Integration Tests: Passed
  • Regression tests: Passed

@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/6f4887ffe25577c/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 1

Live output at: http://3.101.106.178:8877/99fd7e634af8a59/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/6f4887ffe25577c/output.txt

Total script time: 4.57 mins

  • Font tests: Passed
  • Unit tests: FAILED
  • Integration Tests: Passed
  • Regression tests: Passed

@Snuffleupagus Snuffleupagus marked this pull request as ready for review February 6, 2021 13:54
@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Windows)


Failed

Full output at http://3.101.106.178:8877/ec94063a0568570/output.txt

Total script time: 60.00 mins

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Windows)


Failed

Full output at http://3.101.106.178:8877/99fd7e634af8a59/output.txt

Total script time: 0.26 mins

@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/c5267a85d2ee2a4/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://3.101.106.178:8877/fbd99a33b9a28d2/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/c5267a85d2ee2a4/output.txt

Total script time: 22.97 mins

  • Font tests: Passed
  • Unit tests: FAILED
  • Integration Tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/c5267a85d2ee2a4/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Windows)


Failed

Full output at http://3.101.106.178:8877/fbd99a33b9a28d2/output.txt

Total script time: 28.86 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED

Image differences available at: http://3.101.106.178:8877/fbd99a33b9a28d2/reftest-analyzer.html#web=eq.log

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Linux m4)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/d751f001100b97f/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/d751f001100b97f/output.txt

Total script time: 4.39 mins

Published

@timvandermeij timvandermeij merged commit 8ccd9ea into mozilla:master Feb 6, 2021
@timvandermeij
Copy link
Contributor

Ah, nice find!

/botio makeref

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/058606bf89c31cf/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Windows)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 1

Live output at: http://3.101.106.178:8877/c08104d430f1881/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/058606bf89c31cf/output.txt

Total script time: 21.35 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

pdfjsbot commented Feb 6, 2021

From: Bot.io (Windows)


Success

Full output at http://3.101.106.178:8877/c08104d430f1881/output.txt

Total script time: 26.75 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@Snuffleupagus Snuffleupagus deleted the ObjectLoader-errors branch February 6, 2021 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants