Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slightly simplify the lookup of data in Dict.{get, getAsync, has} #11672

Merged
merged 1 commit into from
Mar 6, 2020

Conversation

Snuffleupagus
Copy link
Collaborator

Note that Dict.set will only be called with values returned through Parser.getObj, and thus indirectly via Lexer.getObj. Since neither of those methods will ever return undefined, we can simply assert that that's the case when inserting data into the Dict and thus get rid of in checks when doing the data lookups.
In this case, since Dict.set is fairly hot, the patch utilizes an inline check and when necessary a direct call to unreachable to not affect performance of gulp server/test too much (rather than always just calling assert).

For very large and complex PDF files this will help performance slightly, since Dict.{get, getAsync, has} is called a lot during parsing in the worker.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:

[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 250,
       "type": "eq"
    }
]

which gave the following results when comparing this patch against the master branch:

-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   250 |         2838 |        2820 | -18 | -0.65 |        faster
Firefox | Page Request |   250 |            1 |           2 |   0 | 11.92 |        slower
Firefox | Rendering    |   250 |         2837 |        2818 | -19 | -0.65 |        faster

Note that `Dict.set` will only be called with values returned through `Parser.getObj`, and thus indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply assert that that's the case when inserting data into the `Dict` and thus get rid of `in` checks when doing the data lookups.
In this case, since `Dict.set` is fairly hot, the patch utilizes an *inline check* and when necessary a direct call to `unreachable` to not affect performance of `gulp server/test` too much (rather than always just calling `assert`).

For very large and complex PDF files this will help performance *slightly*, since `Dict.{get, getAsync, has}` is called *a lot* during parsing in the worker.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 250,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   250 |         2838 |        2820 | -18 | -0.65 |        faster
Firefox | Page Request |   250 |            1 |           2 |   0 | 11.92 |        slower
Firefox | Rendering    |   250 |         2837 |        2818 | -19 | -0.65 |        faster
```
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Mar 6, 2020

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/4bfa16f962f5530/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Mar 6, 2020

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/e974483767d1592/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Mar 6, 2020

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/e974483767d1592/output.txt

Total script time: 19.31 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/e974483767d1592/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Mar 6, 2020

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/4bfa16f962f5530/output.txt

Total script time: 24.54 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/4bfa16f962f5530/reftest-analyzer.html#web=eq.log

@timvandermeij timvandermeij merged commit 5d566b9 into mozilla:master Mar 6, 2020
@timvandermeij
Copy link
Contributor

Looks like a good improvement to me. Thank you!

@Snuffleupagus Snuffleupagus deleted the Dict-set-value-assert branch March 6, 2020 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants