Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the webpack+acorn transform with a Babel plugin #17563

Merged

Conversation

nicolo-ribaudo
Copy link
Contributor

@nicolo-ribaudo nicolo-ribaudo commented Jan 22, 2024

I was playing around with the PDF.js build process, and I noticed that webpack runs both some Babel plugins, and a custom acorn-based transform (pdfjsdev-loader).

I tried to merge the Babel and the Acorn passes, to skip a AST->string->AST round-trip. With the two separate transforms this is what happens:

  1. Babel parses the code
  2. Babel transforms the AST
  3. Babel generates the code
  4. Acorn parses the code
  5. pdfjsdev-loader transforms the AST
  6. @javascript-obfuscator/escodegen generates the code
  7. Webpack parses the file
  8. Webpack concatenates the files

With this PR, this is reduced to:

  1. Babel parses the code
  2. Babel transforms the AST
  3. babel-plugin-pdfjs-preprocessor transforms the AST
  4. Babel generates the code
  5. Webpack parses the file
  6. Webpack concatenates the files

On my machine, this gives a consistent speedup in build times. gulp lib went down from 3.4s to 2.6s and gulp dist went down from 36s to 29s. Also, the Babel plugin is smaller than the acorn transforms because Babel provides two utilities that the plugin can heavily rely on:

  • t.valueToNode(val), to convert a JS value into the equivalent AST node. This is used when inlining JSON in the code.
  • path.evaluate(), to statically evaluate simple expressions. This is used when simplifying the various build-time conditions to strip away unused code.

The transform was tested, so we know that the new plugin keeps the old behavior with a couple differences:

  • Babel formats code differently from @javascript-obfuscator/escodegen, so there are some whitespaces/quotes differences. This doesn't matter in practice, as when running the transform in the full build process and not in isolation Babel was already part of the process, so it was already affecting code formatting.
  • In external/builder/fixtures_esprima/constants-expected.js there is a var m = '1' === true; that is now transformed to var m = false;. This is ok because the runtime behavior is the same, and keeping the old behavior would require more work because we cannot use path.evaluate() directly.

Also, I didn't implement the saveComments option of pdfjsdev-loader. That option's only working (and used) value was false, because acorn already discards all the comments by default so there was never any comment attached to the AST.

I opened this PR because I saw that you already use a custom Babel plugin (babelPluginReplaceNonWebpackImport), but it doesn't actually affect users of pdf.js (it only speeds up the build for devs that integrate in in their app), so I won't be offended if you tell me that from a maintenance perspective the current setup is better 😬 However, this PR also has the advantage of removing some deps withuot introducing new ones.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@nicolo-ribaudo
Copy link
Contributor Author

Oh I forgot to mention that this makes the error messages when using PDFJSDev.* improperly slightly better. For example, for PDFJSDev.eval("BUNDLE_VERSION FOO"), the old error is
image
while the new error is
image

Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch; I've not actually tested this locally, but based on a very quick look the following are some initial questions/observations.

  • Please make sure that the commit message contains (all) relevant context from Replace the webpack+acorn transform with a Babel plugin #17563 (comment), such that it's possible to understand what a patch does and importantly why e.g. on the git command line as well.
  • Please squash commits, since we don't use separate fixup commits in this project.
  • Since you only mention gulp lib, please keep in mind that e.g. the gulp generic and gulp mozcentral builds are much more important overall (note that gulp lib is only used to allow unit-testing in Node.js).
  • Have you checked that this passes all tests, by successfully running gulp test (or at least gulp test --noChrome) locally?
    See https://github.com/mozilla/pdf.js/wiki/Contributing#4-run-lint-and-testing and note e.g. the "Generating reference images"-section (needs to be done against the master branch before testing your patch.)
  • How does this affect the final size of the builds, in particular the output of gulp generic and gulp mozcentral, when compared to the master branch?

@nicolo-ribaudo
Copy link
Contributor Author

nicolo-ribaudo commented Jan 22, 2024

(squashing now)

  • Since you only mention gulp lib, please keep in mind that e.g. the gulp generic and gulp mozcentral builds are much more important overall (note that gulp lib is only used to allow unit-testing in Node.js).
  • gulp generic goes from 5.5s to 4.0s
  • gulp mozcentral goes from 4.7s to 3.2s
  • Have you checked that this passes all tests, by successfully running gulp test (or at least gulp test --noChrome) locally?
    See Wiki: Contributing (4 run lint and testing) () and note e.g. the "Generating reference images"-section (needs to be done against the master branch before testing your patch.)

Yes :) But only with --noChrome because Chrome tests for me are failing with this error that I haven't figured out yet how to solve: Error while starting chrome: Network.enable timed out. Increase the 'protocolTimeout' setting in launch/connect calls for a higher timeout if needed..

  • How does this affect the final size of the builds, in particular the output of gulp generic and gulp mozcentral, when compared to the master branch?
  • gulp generic
    • pdf.sandbox.mjs: 612 KiB to 614 KiB (this seems to be due @javascript-obfuscator/escodegen being smarter than Babel when deciding aboutsingle vs double quotes for strings to avoid unnecessary escapes in the inlined PDF_SCRIPTING_JS_SOURCE, I can figure out how to restore the previous output if this is a problem)
    • pdf.mjs: 597 KiB (no change)
    • pdf.worker.mjs: 1.79 MiB (no change)
    • viewer.mjs: 510 KiB (no change)
  • gulp mozcentral has no size change in any output file

@nicolo-ribaudo nicolo-ribaudo force-pushed the build-step-remove-custom-webpack branch from 05dc3f2 to 5131878 Compare January 22, 2024 17:33
@nicolo-ribaudo
Copy link
Contributor Author

nicolo-ribaudo commented Jan 22, 2024

@Snuffleupagus I'm not sure about what's going on with CI -- could you try restarting it?

EDIT: I re-force-pushed to trigger it again, since it was failing during npm install with a bunch of 404 errors. still failing 🤔

@nicolo-ribaudo nicolo-ribaudo force-pushed the build-step-remove-custom-webpack branch from 5131878 to 8fb9ba9 Compare January 22, 2024 17:40
@Snuffleupagus
Copy link
Collaborator

/botio-linux preview

@Snuffleupagus
Copy link
Collaborator

/botio test

@nicolo-ribaudo
Copy link
Contributor Author

Uhm I'm not sure why I did't catch those differences locally (not even the firefox-related ones). I'll try diffing the output bundles to see what could cause it.

@nicolo-ribaudo
Copy link
Contributor Author

Or is it expected that sometimes there are some slight variations in color of some pixels? e.g.
image

I see that some PRs are merged with this type of failures.

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Jan 23, 2024

Given that Firefox was just updated to version 124, most of the movement is probably related to that. I'll trigger testing and makeref in a PR without any code changes, and then we can re-run tests here.

Unfortunately, as you've already noticed, we also have a number of intermittent failures with small pixel differences in the rendered documents. While obviously annoying, this is a known issue that's nothing to worry about.


However, having just tested this locally and diffed the output of gulp mozcentral for the master branch and this patch there's now some unnecessary return;-lines included in the output; see an example diff below.

For example, this method once built with gulp mozcentral now includes a "useless" return; which wasn't previously the case.

--- C:\Users\Jonas\Desktop\mozcentral\browser\extensions\pdfjs\content\web\viewer-geckoview.mjs
+++ C:\Users\Jonas\Git\pdf.js\build\mozcentral\browser\extensions\pdfjs\content\web\viewer-geckoview.mjs
@@ -5030,7 +5030,7 @@
   #scaleTimeoutId = null;
   #textLayerMode = TextLayerMode.ENABLE;
   constructor(options) {
-    const viewerVersion = '4.1.75';
+    const viewerVersion = "4.1.76";
     if (version !== viewerVersion) {
       throw new Error(`The API version "${version}" does not match the Viewer version "${viewerVersion}".`);
     }
@@ -6103,7 +6103,9 @@
   get scrollMode() {
     return this._scrollMode;
   }
-  set scrollMode(mode) {}
+  set scrollMode(mode) {
+    return;
+  }
   _updateScrollMode(pageNumber = null) {
     const scrollMode = this._scrollMode,
       viewer = this.viewer;
@@ -6128,7 +6130,9 @@
   get spreadMode() {
     return this._spreadMode;
   }
-  set spreadMode(mode) {}
+  set spreadMode(mode) {
+    return;
+  }
   _updateSpreadMode(pageNumber = null) {
     if (!this.pdfDocument) {
       return;
@@ -6509,6 +6513,7 @@
   async _writeToStorage() {
     const databaseStr = JSON.stringify(this.database);
     sessionStorage.setItem("pdfjs.history", databaseStr);
+    return;
   }
   async _readFromStorage() {
     return sessionStorage.getItem("pdfjs.history");
@@ -7528,7 +7533,9 @@
       source: this
     });
   },
-  async _initializePageLabels(pdfDocument) {},
+  async _initializePageLabels(pdfDocument) {
+    return;
+  },
   _initializePdfHistory({
     fingerprint,
     viewOnLoad,
@@ -8610,34 +8617,34 @@
 
 class BasePreferences {
   #defaults = Object.freeze({
-    "annotationEditorMode": 0,
-    "annotationMode": 2,
-    "cursorToolOnLoad": 0,
-    "defaultZoomDelay": 400,
-    "defaultZoomValue": "",
-    "disablePageLabels": false,
-    "enableHighlightEditor": false,
-    "enablePermissions": false,
-    "enablePrintAutoRotate": true,
-    "enableScripting": true,
-    "externalLinkTarget": 0,
-    "highlightEditorColors": "yellow=#FFFF98,green=#53FFBC,blue=#80EBFF,pink=#FFCBE6,red=#FF4F5F",
-    "historyUpdateUrl": false,
-    "ignoreDestinationZoom": false,
-    "forcePageColors": false,
-    "pageColorsBackground": "Canvas",
-    "pageColorsForeground": "CanvasText",
-    "pdfBugEnabled": false,
-    "sidebarViewOnLoad": -1,
-    "scrollModeOnLoad": -1,
-    "spreadModeOnLoad": -1,
-    "textLayerMode": 1,
-    "viewOnLoad": 0,
-    "disableAutoFetch": false,
-    "disableFontFace": false,
-    "disableRange": false,
-    "disableStream": false,
-    "enableXfa": true
+    annotationEditorMode: 0,
+    annotationMode: 2,
+    cursorToolOnLoad: 0,
+    defaultZoomDelay: 400,
+    defaultZoomValue: "",
+    disablePageLabels: false,
+    enableHighlightEditor: false,
+    enablePermissions: false,
+    enablePrintAutoRotate: true,
+    enableScripting: true,
+    externalLinkTarget: 0,
+    highlightEditorColors: "yellow=#FFFF98,green=#53FFBC,blue=#80EBFF,pink=#FFCBE6,red=#FF4F5F",
+    historyUpdateUrl: false,
+    ignoreDestinationZoom: false,
+    forcePageColors: false,
+    pageColorsBackground: "Canvas",
+    pageColorsForeground: "CanvasText",
+    pdfBugEnabled: false,
+    sidebarViewOnLoad: -1,
+    scrollModeOnLoad: -1,
+    spreadModeOnLoad: -1,
+    textLayerMode: 1,
+    viewOnLoad: 0,
+    disableAutoFetch: false,
+    disableFontFace: false,
+    disableRange: false,
+    disableStream: false,
+    enableXfa: true
   });
   #prefs = Object.create(null);
   #initializedPromise = null;
@@ -8650,13 +8657,13 @@
       prefs
     }) => {
       const BROWSER_PREFS = {
-        "canvasMaxAreaInBytes": -1,
-        "isInAutomation": false,
-        "supportsDocumentFonts": true,
-        "supportsIntegratedFind": false,
-        "supportsMouseWheelZoomCtrlKey": true,
-        "supportsMouseWheelZoomMetaKey": true,
-        "supportsPinchToZoom": true
+        canvasMaxAreaInBytes: -1,
+        isInAutomation: false,
+        supportsDocumentFonts: true,
+        supportsIntegratedFind: false,
+        supportsMouseWheelZoomCtrlKey: true,
+        supportsMouseWheelZoomMetaKey: true,
+        supportsPinchToZoom: true
       };
       const options = Object.create(null);
       for (const [name, defaultVal] of Object.entries(BROWSER_PREFS)) {
@@ -9242,8 +9249,8 @@
 
 
 
-const pdfjsVersion = '4.1.75';
-const pdfjsBuild = '8b2472211';
+const pdfjsVersion = "4.1.76";
+const pdfjsBuild = "7ba5cae77";
 const AppConstants = null;
 window.PDFViewerApplication = PDFViewerApplication;
 window.PDFViewerApplicationConstants = AppConstants;

@nicolo-ribaudo nicolo-ribaudo force-pushed the build-step-remove-custom-webpack branch from 8fb9ba9 to 6db7bf6 Compare January 23, 2024 11:03
@nicolo-ribaudo
Copy link
Contributor Author

Oh, fixed (diff: https://github.com/mozilla/pdf.js/compare/8fb9ba9add01b92f592dbcbc54d5e1ef77c16db1..6db7bf62fe8b5e36af1e426cfc33f7b5cd4074c6). I had forgot that acorn uses FunctionExpression also for methods.

@nicolo-ribaudo nicolo-ribaudo force-pushed the build-step-remove-custom-webpack branch from 6db7bf6 to 610ebcb Compare January 23, 2024 11:07
@nicolo-ribaudo nicolo-ribaudo force-pushed the build-step-remove-custom-webpack branch from 610ebcb to 8d4c4f1 Compare January 23, 2024 11:30
@mozilla mozilla deleted a comment from moz-tools-bot Jan 23, 2024
@mozilla mozilla deleted a comment from moz-tools-bot Jan 23, 2024
@mozilla mozilla deleted a comment from moz-tools-bot Jan 23, 2024
@mozilla mozilla deleted a comment from moz-tools-bot Jan 23, 2024
@mozilla mozilla deleted a comment from moz-tools-bot Jan 23, 2024
@mozilla mozilla deleted a comment from moz-tools-bot Jan 23, 2024
@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Jan 23, 2024

Another question, to help both with reviewing (since it might produce a smaller diff) and with preserving blame: Would it be possible to do git mv external/builder/preprocessor2.mjs external/builder/babel-plugin-pdfjs-preprocessor.mjs?

@nicolo-ribaudo nicolo-ribaudo force-pushed the build-step-remove-custom-webpack branch from 8d4c4f1 to 79df35a Compare January 23, 2024 12:10
This is in preparation for the next commit, which will convert
preprocessor2.mjs to a Babel plugin. The purpose of this commit
is to help git track the rename regardless of the large amount
of changes.
@nicolo-ribaudo nicolo-ribaudo force-pushed the build-step-remove-custom-webpack branch from 79df35a to 84ddcb3 Compare January 23, 2024 12:14
@nicolo-ribaudo
Copy link
Contributor Author

nicolo-ribaudo commented Jan 23, 2024

Done, but I had to keep it as a separate commit otherwise Git cannot track the rename.

When reviewing the second commit with the actual changes, the diff is better if you disable whitespace changes.

This commit converts the pdfjsdev-loader transform into a Babel plugin,
to skip a AST->string->AST round-trip.

Before this commit, the webpack build process was:
1. Babel parses the code
2. Babel transforms the AST
3. Babel generates the code
4. Acorn parses the code
5. pdfjsdev-loader transforms the AST
6. @javascript-obfuscator/escodegen generates the code
7. Webpack parses the file
8. Webpack concatenates the files

After this commit, it is reduced to:
1. Babel parses the code
2. Babel transforms the AST
3. babel-plugin-pdfjs-preprocessor transforms the AST
4. Babel generates the code
5. Webpack parses the file
6. Webpack concatenates the files

This change improves the build time by ~25% (tested on MacBook Air M2):
- `gulp lib`: 3.4s to 2.6s
- `gulp dist`: 36s to 29s
- `gulp generic`: 5.5s to 4.0s
- `gulp mozcentral`: 4.7s to 3.2s

The new Babel plugin doesn't support the `saveComments` option of
pdfjsdev-loader, and it just always discards comments. Even though
pdfjsdev-loader supported multiple values for that option, it was
effectively ignored due to `acorn` dropping comments by default.
@nicolo-ribaudo nicolo-ribaudo force-pushed the build-step-remove-custom-webpack branch from 84ddcb3 to f724ae9 Compare January 23, 2024 15:01
external/builder/fixtures_esprima/blocks-expected.js Dismissed Show dismissed Hide dismissed
external/builder/fixtures_esprima/blocks-expected.js Dismissed Show dismissed Hide dismissed
external/builder/fixtures_esprima/comments-expected.js Dismissed Show dismissed Hide dismissed
external/builder/fixtures_esprima/comments-expected.js Dismissed Show dismissed Hide dismissed
external/builder/fixtures_esprima/comments-expected.js Dismissed Show dismissed Hide dismissed
external/builder/fixtures_esprima/blocks-expected.js Dismissed Show dismissed Hide dismissed
external/builder/fixtures_esprima/comments-expected.js Dismissed Show dismissed Hide dismissed
external/builder/fixtures_esprima/comments-expected.js Dismissed Show dismissed Hide dismissed
external/builder/fixtures_esprima/comments-expected.js Dismissed Show dismissed Hide dismissed
external/builder/fixtures_esprima/comments-expected.js Dismissed Show dismissed Hide dismissed
@Snuffleupagus
Copy link
Collaborator

/botio-linux preview

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/671d94cf610c7f8/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/671d94cf610c7f8/output.txt

Total script time: 1.23 mins

Published

@Snuffleupagus
Copy link
Collaborator

/botio test

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/a2edfe625087235/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/ecfb9bcc1a3078c/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/ecfb9bcc1a3078c/output.txt

Total script time: 2.82 mins

  • Unit tests: FAILED
  • Integration Tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://54.193.163.58:8877/ecfb9bcc1a3078c/reftest-analyzer.html#web=eq.log

@Snuffleupagus
Copy link
Collaborator

/botio-windows test

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/83892ee557d5f57/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/a2edfe625087235/output.txt

Total script time: 24.45 mins

  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 16
  different first/second rendering: 1

Image differences available at: http://54.241.84.105:8877/a2edfe625087235/reftest-analyzer.html#web=eq.log

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/83892ee557d5f57/output.txt

Total script time: 38.45 mins

  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 6

Image differences available at: http://54.193.163.58:8877/83892ee557d5f57/reftest-analyzer.html#web=eq.log

Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build-time improvements obviously cannot hurt, and reducing the number of direct dependencies needed for building is also very good.
(Especially getting rid of escodegen seems beneficial since it hasn't been very "actively" maintained the last few years, which forced us to switch to a fork.)

r=me, thank you for the patch!

@Snuffleupagus Snuffleupagus merged commit ae62787 into mozilla:master Jan 23, 2024
9 checks passed
@nicolo-ribaudo nicolo-ribaudo deleted the build-step-remove-custom-webpack branch January 23, 2024 18:59
@nicolo-ribaudo
Copy link
Contributor Author

Thanks for helping and reviewing :)

@Snuffleupagus
Copy link
Collaborator

@nicolo-ribaudo Unfortunately this seems to have caused a regression, see issue #17589; can you please help out with that?

@nicolo-ribaudo
Copy link
Contributor Author

Yes I'll take a look tomorrow or on Monday!

}
},
},
UnaryExpression(path) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is probably to add exit here like to the other visitors (I can submit a PR tomorrow, I'm from my phone right now), so that the UnaryExpression is checked after that the inner PDFJSDev.test call is replaced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants