Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: resolver & spec hardening /w refactoring #40510

Closed
wants to merge 1 commit into from

Conversation

guybedford
Copy link
Contributor

@guybedford guybedford commented Oct 19, 2021

This PR includes the following hardening / refactorings to the ESM resolver & specification:

  • In package subpath segment filtering, we throw for ., .. and node_modules segments in package exports or package exports pattern replacements. This error checking is now extended to be case-insensitive and also check for percent encodings of these segments in the spec and resolver. Previously import 'pkg/.%2e/.%2e/outside.js' would be a way to bypass the restriction of only being able to resolve exports within the package (note this is not a security issue, because this is a usability / encapsulation feature and not a security feature).
  • The invalid package name check in the resolver was not detecting package names starting with . as invalid per the spec, that has been corrected with a regex check instead of a for loop.
  • The main resolver spec is now extended to support all URL schemes, special casing the file: scheme. We then indicate that for non file: schemes the associated content type can be used without explicitly specifying the source. This effectively generalizes the resolver to specify the data: behaviours already implemented today and also paves the way in the resolver for other URL schemes in future. We don't currently specify the MIME types directly, although that could be done here as well.
  • A spec bug is fixed where stream/web would not have been detected as a valid builtin name since it would fail the valid package name test. Moving the builtin check higher up in the spec fixes this.
  • The resolver implementation is further refactored to line up with the same execution flow as the specification, refactoring enabled by the unified JS resolver that wasn't previously possible with the C++ separation. This reduces unnecessary extra URL parsing steps by sharing more logic while retaining identical behaviours.
  • The ESM_FORMAT spec function is extended to support the ".json" file type per the JSON assertions implementation work.

@nodejs-github-bot nodejs-github-bot added esm Issues and PRs related to the ECMAScript Modules implementation. needs-ci PRs that need a full CI run. labels Oct 19, 2021
@guybedford guybedford requested review from bmeck and aduh95 October 19, 2021 05:12
lib/internal/modules/esm/resolve.js Outdated Show resolved Hide resolved
@DerekNonGeneric
Copy link
Contributor

/cc @nodejs/modules @nodejs/loaders (hope to get around to reviewing this myself, but it could be a while).

@nodejs-github-bot
Copy link
Collaborator

@guybedford guybedford force-pushed the esm-resolve-hardening branch from 54276f3 to 687c6ad Compare October 24, 2021 19:10
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

guybedford added a commit that referenced this pull request Oct 24, 2021
PR-URL: #40510
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@guybedford
Copy link
Contributor Author

Landed in f0dcac8.

@guybedford guybedford closed this Oct 24, 2021
@guybedford guybedford deleted the esm-resolve-hardening branch October 24, 2021 21:46

> 1. Assert: _url_ corresponds to an existing file.
> 2. Let _pjson_ be the result of **READ\_PACKAGE\_SCOPE**(_url_).
> 3. If _url_ ends in _".mjs"_, then
> 1. Return _"module"_.
> 4. If _url_ ends in _".cjs"_, then
> 1. Return _"commonjs"_.
> 5. If _pjson?.type_ exists and is _"module"_, then
> 5. If _url_ ends in _".json"_, then
> 1. Return _"json"_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn’t the case unless --experimental-json-modules is enabled; and even then, assert { type: 'json' } will be required, which I’d think should be mentioned here. Perhaps this would be better added when #40250 lands?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this spec doesn't mandate the assertion doesn't mean Node.js core can't mandate the assertion. It's more a question of layering I think. I'm definitely open to explicitly calling out that the assertion is mandated if we have full consensus on that, just wanted to do the minimal change initially.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is more that this spec seems to imply that Node can’t mandate the assertion. As in, if the URL ends in .json, that’s the end of the story: the spec says so. I would maybe revise this to:

> 5. If _url_ ends in _".json"_, and the import assertion _type_ is _"json"_, then
     1. Return _"json"_.

It already went to the TSC that the assertion will be mandated, at least at first (when the PR that adds support lands). If/when it ever becomes optional, we can update this spec accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would need to also include the initial _assertions_ parameter as an explicit argument into the resolution function to do this, perhaps a follow-up PR then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that’s part of #40250. We could maybe include the spec update in that PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I interpreted that was file name ending in ".json" is the browser equivalent of having a response header Content-Type: application/json. To me, the fact a .json file is considered as JSON by Node.js spec does not mean it can be imported without an assertion (the same way the HTML spec do not accept to load a module if the Content-Type is application/json). I don't think we should make the _assertions_ parameter as an explicit argument into the resolution function, a .json file should always be considered as JSON, the assertion check should come after that.

> 8. Otherwise,
> 1. Set _format_ the module format of the content type associated with the
> URL _resolved_.
> 9. Load _resolved_ as module format, _format_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is describing Node’s internal behavior, but does this section correspond only with the user loader resolve hook, or both the resolve and load hooks? Because for user loaders, resolve returning format is optional; it’s load that does the definitive determination of format, to handle cases like an HTTPS loader where we don’t know the format until we get a network response with a Content-Type header.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I interpreted this the same as Geoffrey—that resolve's format is authoritative rather than suggestive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this isn't really specifyin a resolve hook at this point so much as just specifying how to load overall. I wouldn't interpret resolve as being equivalent to any concept of a resolve hook here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t have a suggestion for how to do so, but if you can think of a way to make it clearer that this doesn’t correspond directly with the resolve hook, I think that would make this better. The fact that the section heading is **ESM\_RESOLVE**(_specifier_, _parentURL_), when those are two of the arguments passed into the resolve hook, seems to strongly imply a similarity.

Or maybe the spec could correlate with the resolve and load hooks, describing the hooks’ “default” behavior? Then when people write their own versions, they have a reference of how those hooks “should” be if they were to follow Node’s example. I guess that would mean splitting this section into two, one for resolve and one for load.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to think of this is more like HOST_RESOLVE_IMPORTED_MODULE in being a single resolver that covers the requirements like it does in ECMA262, as opposed to any fine-grained hooks.

@JakobJingleheimer
Copy link
Member

@guybedford cool, thanks for this 😀

@DerekNonGeneric thanks for the heads up!

FYI @arcanis @jonaskello (I know you have raw versions of these utils in your stuff)

Comment on lines +403 to +407
resolved =
pathToFileURL(real + (StringPrototypeEndsWith(path, sep) ? '/' : ''));
resolved.search = search;
resolved.hash = hash;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I missed the ship (I was away this weekend): I think a code comment would be useful here, something like

Suggested change
resolved =
pathToFileURL(real + (StringPrototypeEndsWith(path, sep) ? '/' : ''));
resolved.search = search;
resolved.hash = hash;
}
// preserve the original extras onto the true fileUrl
resolved =
pathToFileURL(real + (StringPrototypeEndsWith(path, sep) ? '/' : ''));
resolved.search = search;
resolved.hash = hash;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed more comments are always good - let's follow up on this.

@guybedford
Copy link
Contributor Author

Also my apologies for not including the nodejs/modules cc as I usually do, that was an oversight on my part and thanks @DerekNonGeneric for spotting this as well.

Yes boat has sailed on this one but comments are all good points, I've responded above.

@DerekNonGeneric DerekNonGeneric added the loaders Issues and PRs related to ES module loaders label Oct 25, 2021
targos pushed a commit that referenced this pull request Nov 6, 2021
PR-URL: #40510
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@targos targos mentioned this pull request Nov 8, 2021
danielleadams pushed a commit that referenced this pull request Jan 29, 2022
PR-URL: #40510
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. loaders Issues and PRs related to ES module loaders needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants