-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request all data, rather than throwing, when encountering general errors in ObjectLoader._walk
(issue 9462, PR 3289 follow-up)
#12965
Request all data, rather than throwing, when encountering general errors in ObjectLoader._walk
(issue 9462, PR 3289 follow-up)
#12965
Conversation
…ors in `ObjectLoader._walk` (issue 9462, PR 3289 follow-up) *As far as I can tell, this has been broken ever since PR 3289 (back in 2013) without anyone noticing.* For any non-`MissingDataException` errors encountered in `ObjectLoader._walk`, we're simply throwing immediately which thus has the potential to *completely* break rendering of an entire page. In practice this is obviously only an issue for PDF documents which are in one way or another corrupt, since that's the only way that `XRef.fetch` will throw non-`MissingDataException` errors. To make matters worse these errors are *intermittent*, since they can only occur if the document is still loading when the `ObjectLoader`-code runs (note the early return in `ObjectLoader.load`). Please note that we cannot simply catch the error and let "normal" parsing continue in `ObjectLoader._walk`, since that could lead to errors elsewhere given that resources "below" the current one (in the graph) might not be checked as intended then. All-in-all, the only way to make absolutely sure that we won't cause *unexpected* `MissingDataException`s somewhere else in the code-base is to fallback to fetching the *entire* document in this edge-case.
/botio test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://3.101.106.178:8877/815a7e8da8e5100/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/f8507ed32465d34/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/f8507ed32465d34/output.txt Total script time: 4.62 mins
|
From: Bot.io (Windows)SuccessFull output at http://3.101.106.178:8877/815a7e8da8e5100/output.txt Total script time: 5.69 mins
|
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/f2b8a6dfc5e68c9/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://3.101.106.178:8877/ec94063a0568570/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/f2b8a6dfc5e68c9/output.txt Total script time: 4.60 mins
|
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/6f4887ffe25577c/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 1 Live output at: http://3.101.106.178:8877/99fd7e634af8a59/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/6f4887ffe25577c/output.txt Total script time: 4.57 mins
|
61c30fb
to
d3e65f2
Compare
From: Bot.io (Windows)FailedFull output at http://3.101.106.178:8877/ec94063a0568570/output.txt Total script time: 60.00 mins |
From: Bot.io (Windows)FailedFull output at http://3.101.106.178:8877/99fd7e634af8a59/output.txt Total script time: 0.26 mins |
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/c5267a85d2ee2a4/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://3.101.106.178:8877/fbd99a33b9a28d2/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/c5267a85d2ee2a4/output.txt Total script time: 22.97 mins
Image differences available at: http://54.67.70.0:8877/c5267a85d2ee2a4/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://3.101.106.178:8877/fbd99a33b9a28d2/output.txt Total script time: 28.86 mins
Image differences available at: http://3.101.106.178:8877/fbd99a33b9a28d2/reftest-analyzer.html#web=eq.log |
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/d751f001100b97f/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/d751f001100b97f/output.txt Total script time: 4.39 mins Published |
Ah, nice find! /botio makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/058606bf89c31cf/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 1 Live output at: http://3.101.106.178:8877/c08104d430f1881/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/058606bf89c31cf/output.txt Total script time: 21.35 mins
|
From: Bot.io (Windows)SuccessFull output at http://3.101.106.178:8877/c08104d430f1881/output.txt Total script time: 26.75 mins
|
As far as I can tell, this has been broken ever since PR #3289 (back in 2013) without anyone noticing.
For any non-
MissingDataException
errors encountered inObjectLoader._walk
, we're simply throwing immediately which thus has the potential to completely break rendering of an entire page.In practice this is obviously only an issue for PDF documents which are in one way or another corrupt, since that's the only way that
XRef.fetch
will throw non-MissingDataException
errors. To make matters worse these errors are intermittent, since they can only occur if the document is still loading when theObjectLoader
-code runs (note the early return inObjectLoader.load
).Please note that we cannot simply catch the error and let "normal" parsing continue in
ObjectLoader._walk
, since that could lead to errors elsewhere given that resources "below" the current one (in the graph) might not be checked as intended then.All-in-all, the only way to make absolutely sure that we won't cause unexpected
MissingDataException
s somewhere else in the code-base is to fallback to fetching the entire document in this edge-case.While debugging and fixing this, I went through every stage of http://plasmasturm.org/log/6debug/