Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deps: replace url parser with Ada #46410

Closed
wants to merge 2 commits into from
Closed

Conversation

anonrig
Copy link
Member

@anonrig anonrig commented Jan 29, 2023

This work was done in collaboration with me, @miguelteixeiraa and @lemire. I would also like to thank @addaleax and @ronag for their help.

This pull request replaces the existing URL parser with Ada, a fast spec-compliant URL parser written from scratch using modern C++ focused on performance.

A little bit about Ada:

  • 100% spec compliant (even has its own WPT updater, which led us to find a couple of errors in WPT)
  • Fully tested (using both web platform tests, custom tests not covered by WPT) on Big Endian Systems (s390x), Ubuntu 20.04, Ubuntu 22.04 (with G++ and Clang++), Windows VS 2022 (with and without ClangCL)
  • Well documented and available for the public at https://ada-url.github.io/ada
  • On typical URLs, we use between 50 and 60 instructors per input byte for fully parsing and validating a URL string into a URL data structure.

The possibilities with this pull request:

Side Note: Current benchmarks show up to 87% faster URL parsing, with similar but sometimes faster execution speeds compared to url.parse.

Fixes #46332
Fixes #46063
Fixes #30334
Fixes #44476
Fixes nodejs/performance#33
Closes #41220

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/crypto
  • @nodejs/gyp
  • @nodejs/loaders
  • @nodejs/modules
  • @nodejs/node-api
  • @nodejs/tsc
  • @nodejs/vm

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Jan 29, 2023
@anonrig anonrig force-pushed the deps/ada branch 9 times, most recently from 8f4d1a4 to 4a68c84 Compare January 30, 2023 02:37
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that I think should be experimented upon is to avoid copying all data back-and-forth every time a field is updated.

Maybe it might be better to fetch them from C++ every time they are accessed, or possibly use tricks to use the V8 fast API.

@anonrig anonrig force-pushed the deps/ada branch 3 times, most recently from 21adb01 to 6f070dc Compare January 30, 2023 17:28
@joyeecheung
Copy link
Member

Deprecating url.parse

I think that has always been possible, with or without Ada? It's kept for compatibility reasons, not performance.

Removes the need to update and conform to URL WPT

We probably still want to run the JS tests to make sure that the JS glues work properly

@anonrig
Copy link
Member Author

anonrig commented Jan 30, 2023

We probably still want to run the JS tests to make sure that the JS glues work properly

You're right. This pull request does not remove them.

@anonrig anonrig force-pushed the deps/ada branch 2 times, most recently from 5e1e1ac to a1c7401 Compare January 30, 2023 20:35
@anonrig
Copy link
Member Author

anonrig commented Jan 30, 2023

test/sequential/test-inspector.js test is failing. Appreciate any help in pinpointing why it causes timeouts. cc @targos @Trott @addaleax (according to git blame, you were the only ones who worked on this file in the last 3 years)

@MoLow
Copy link
Member

MoLow commented Jan 30, 2023

@anonrig it might very well be related to parsing the debugger websocket url

@anonrig anonrig force-pushed the deps/ada branch 5 times, most recently from eb45d67 to 17d3ed7 Compare February 1, 2023 19:14
@TimothyGu
Copy link
Member

I see right now Ada has Windows-specific IDNA handling using IdnToAscii. I'm a little concerned about this, as it would introduce platform-dependent URL parsing results. I also doubt Windows's IdnToAscii implements UTS46 fully.

I see plans to switch to an internal implementation (ada-url/ada#89) which would alleviate my concern. Do note though, that there are (still) outstanding issues with UTS46 on the spec side (see whatwg/url#744).

(Context: I helped create the previous WHATWG URL parser, and also maintain https://github.com/jsdom/tr46.)

@lemire
Copy link
Member

lemire commented Feb 1, 2023

I see right now Ada has Windows-specific IDNA handling using IdnToAscii.

When ICU is unavailable and we are under Windows, then ada falls back on Windows functions. That is correct.

Wherever we are, if ICU is available, we rely on ICU.

anonrig added a commit to anonrig/node that referenced this pull request Apr 5, 2023
PR-URL: nodejs#46410
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
anonrig added a commit to anonrig/node that referenced this pull request Apr 11, 2023
PR-URL: nodejs#46410
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
anonrig added a commit to anonrig/node that referenced this pull request Apr 11, 2023
PR-URL: nodejs#46410
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
anonrig added a commit to anonrig/node that referenced this pull request Apr 11, 2023
PR-URL: nodejs#46410
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
anonrig added a commit to anonrig/node that referenced this pull request Apr 11, 2023
PR-URL: nodejs#46410
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
danielleadams pushed a commit that referenced this pull request Apr 12, 2023
PR-URL: #46410
Backport-PR-URL: #47435
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
danielleadams pushed a commit that referenced this pull request Apr 12, 2023
PR-URL: #46410
Backport-PR-URL: #47435
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
@danielleadams danielleadams added backported-to-v18.x PRs backported to the v18.x-staging branch. and removed backport-open-v18.x Indicate that the PR has an open backport. labels Apr 12, 2023
danielleadams added a commit that referenced this pull request Apr 12, 2023
Notable changes:

Add initial support for single executable applications

Compile a JavaScript file into a single executable application:

```console
$ echo 'console.log(`Hello, ${process.argv[2]}!`);' > hello.js

$ cp $(command -v node) hello

$ npx postject hello NODE_JS_CODE hello.js \
    --sentinel-fuse NODE_JS_FUSE_fce680ab2cc467b6e072b8b5df1996b2

$ npx postject hello NODE_JS_CODE hello.js \
    --sentinel-fuse NODE_JS_FUSE_fce680ab2cc467b6e072b8b5df1996b2 \
    --macho-segment-name NODE_JS

$ ./hello world
Hello, world!
```

Contributed by Darshan Sen in #45038

Replace url parser with Ada

Node.js gets a new URL parser called Ada that is compliant with the WHATWG
URL Specification and provides more than 100% performance improvement to
the existing implementation.

Contributed by Yagiz Nizipli in #46410

Other notable changes:

* buffer:
  * (SEMVER-MINOR) add Buffer.copyBytesFrom(...) (James M Snell) #46500
* doc:
  * add marco-ippolito to collaborators (Marco Ippolito) #46816
  * add debadree25 to collaborators (Debadree Chatterjee) #46716
  * add deokjinkim to collaborators (Deokjin Kim) #46444
* events:
  * (SEMVER-MINOR) add listener argument to listenerCount (Paolo Insogna) #46523
* lib:
  * (SEMVER-MINOR) add AsyncLocalStorage.bind() and .snapshot() (flakey5) #46387
  * (SEMVER-MINOR) add aborted() utility function (Debadree Chatterjee) #46494
* src:
  * (SEMVER-MINOR) allow optional Isolate termination in node::Stop() (Shelley Vohr) #46583
  * (SEMVER-MINOR) allow embedder control of code generation policy (Shelley Vohr) #46368
* stream:
  * (SEMVER-MINOR) add abort signal for ReadableStream and WritableStream (Debadree Chatterjee) #46273
* tls:
  * (SEMVER-MINOR) support automatic DHE (Tobias Nießen) #46978
* url:
  * (SEMVER-MINOR) implement URLSearchParams size getter (James M Snell) #46308
* worker:
  * (SEMVER-MINOR) add support for worker name in inspector and trace_events (Debadree Chatterjee) #46832

PR-URL: #47502
danielleadams added a commit that referenced this pull request Apr 13, 2023
Notable changes:

Add initial support for single executable applications

Compile a JavaScript file into a single executable application:

```console
$ echo 'console.log(`Hello, ${process.argv[2]}!`);' > hello.js

$ cp $(command -v node) hello

$ npx postject hello NODE_JS_CODE hello.js \
    --sentinel-fuse NODE_JS_FUSE_fce680ab2cc467b6e072b8b5df1996b2

$ npx postject hello NODE_JS_CODE hello.js \
    --sentinel-fuse NODE_JS_FUSE_fce680ab2cc467b6e072b8b5df1996b2 \
    --macho-segment-name NODE_JS

$ ./hello world
Hello, world!
```

Contributed by Darshan Sen in #45038

Replace url parser with Ada

Node.js gets a new URL parser called Ada that is compliant with the WHATWG
URL Specification and provides more than 100% performance improvement to
the existing implementation.

Contributed by Yagiz Nizipli in #46410

Other notable changes:

* buffer:
  * (SEMVER-MINOR) add Buffer.copyBytesFrom(...) (James M Snell) #46500
* doc:
  * add marco-ippolito to collaborators (Marco Ippolito) #46816
  * add debadree25 to collaborators (Debadree Chatterjee) #46716
  * add deokjinkim to collaborators (Deokjin Kim) #46444
* events:
  * (SEMVER-MINOR) add listener argument to listenerCount (Paolo Insogna) #46523
* lib:
  * (SEMVER-MINOR) add AsyncLocalStorage.bind() and .snapshot() (flakey5) #46387
  * (SEMVER-MINOR) add aborted() utility function (Debadree Chatterjee) #46494
* src:
  * (SEMVER-MINOR) allow optional Isolate termination in node::Stop() (Shelley Vohr) #46583
  * (SEMVER-MINOR) allow embedder control of code generation policy (Shelley Vohr) #46368
* stream:
  * (SEMVER-MINOR) add abort signal for ReadableStream and WritableStream (Debadree Chatterjee) #46273
* tls:
  * (SEMVER-MINOR) support automatic DHE (Tobias Nießen) #46978
* url:
  * (SEMVER-MINOR) implement URLSearchParams size getter (James M Snell) #46308
* worker:
  * (SEMVER-MINOR) add support for worker name in inspector and trace_events (Debadree Chatterjee) #46832

PR-URL: #47502
codebytere added a commit to electron/electron that referenced this pull request Apr 15, 2023
codebytere added a commit to electron/electron that referenced this pull request Apr 16, 2023
codebytere added a commit to electron/electron that referenced this pull request Apr 17, 2023
codebytere added a commit to electron/electron that referenced this pull request Apr 18, 2023
codebytere added a commit to electron/electron that referenced this pull request Apr 18, 2023
codebytere added a commit to electron/electron that referenced this pull request Apr 18, 2023
codebytere added a commit to electron/electron that referenced this pull request Apr 18, 2023
codebytere added a commit to electron/electron that referenced this pull request Apr 18, 2023
* chore: bump node in DEPS to v18.16.0

* build,test: add proper support for IBM i

nodejs/node#46739

* lib: enforce use of trailing commas

nodejs/node#46881

* src: add initial support for single executable applications

nodejs/node#45038

* lib: do not crash using workers with disabled shared array buffers

nodejs/node#41023

* src: remove shadowed variable in OptionsParser::Parse

nodejs/node#46672

* src: allow embedder control of code generation policy

nodejs/node#46368

* src: allow optional Isolate termination in node::Stop()

nodejs/node#46583

* lib: fix BroadcastChannel initialization location

nodejs/node#46864

* chore: fixup patch indices

* chore: sync filenames.json

* fix: add simdutf dep to src/inspector BUILD.gn

- nodejs/node#46471
- nodejs/node#46472

* deps: replace url parser with Ada

nodejs/node#46410

* tls: support automatic DHE

nodejs/node#46978

* fixup! src: add initial support for single executable applications

* http: unify header treatment

nodejs/node#46528

* fix: libc++ buffer overflow in string_view ctor

nodejs/node#46410

* test: include strace openat test

nodejs/node#46150

* fixup! fixup! src: add initial support for single executable applications

---------

Co-authored-by: electron-roller[bot] <84116207+electron-roller[bot]@users.noreply.github.com>
Co-authored-by: Shelley Vohr <[email protected]>
@RaisinTen RaisinTen mentioned this pull request May 6, 2023
nodejs-github-bot pushed a commit that referenced this pull request Aug 20, 2023
PR-URL: #49097
Refs: #46410
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Deokjin Kim <[email protected]>
UlisesGascon pushed a commit that referenced this pull request Sep 10, 2023
PR-URL: #49097
Refs: #46410
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Deokjin Kim <[email protected]>
targos pushed a commit that referenced this pull request Nov 27, 2023
PR-URL: #49097
Refs: #46410
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Deokjin Kim <[email protected]>
sercher added a commit to sercher/graaljs that referenced this pull request Apr 25, 2024
sercher added a commit to sercher/graaljs that referenced this pull request Apr 25, 2024
@richardlau richardlau removed the baking-for-lts PRs that need to wait before landing in a LTS release. label May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. backported-to-v18.x PRs backported to the v18.x-staging branch. commit-queue-rebase Add this label to allow the Commit Queue to land a PR in several commits. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. notable-change PRs with changes that should be highlighted in changelogs. review wanted PRs that need reviews.
Projects
None yet