Module cache keying semantics #30

guybedford · 2019-11-26T20:00:13Z

Currently the module cache is typically keyed uniquely by URL.

With this proposal the module identity now seems to include both the URL and the attributes.

Would the expectation be that the module cache keying would be extended to support the attribute identity?

Or alternatively will the URL remain the primary keying with some reconciliation approaches in place? How would those approaches avoid causing indeterminism based on load ordering?

devsnek · 2019-11-26T20:08:19Z

To be more specific, modules in the spec are keyed as (referrer, specifier) pairs, so the exact problem is a single file with two imports of the same specifier with different attributes.

littledan · 2019-11-26T22:59:51Z

I was imagining that the module attributes would not be part of the key. That is, the same module specifier and referrer would always point to the same module record, even when imported twice with different parameters. However, a subsequent load with different module attributes may lead to an exception being thrown rather than the module returned. Here's some examples of possible semantics (each of which could be debated further) which are all consistent with the idea that the module attributes are not part of the key, but the host may keep them around:

With type on the web, if a module is imported once, the type must either be provided or missing (JS), and the subsequent loads must pass the same type option (otherwise the check would've failed).
If we have fetch options passed through module attributes, subsequent loads could either repeat the fetch options of the initial load, or omit them, but not pass additional or different options.
If we have signature-based SRI passed through module attributes, any site for importing the module may pass in a signature, and that that would be checked against the module contents for that import site (that is, SRI errors would not be cached, in this idea).

This sort of interpretation rules out some use cases that @xtuc mentioned about module attributes determining the processing mode, when used in a way that a module could be imported multiple times in different processing modes.

bmeck · 2019-12-12T15:28:37Z

It seems like races could occur if first to import wins:

try {
  // try to load as JS, this could make all attempts to load JSON fail
  await import('foo');
} catch (e) {
  // try to load as JSON? maybe thats how other people are using this thing?
  const {default: foo} = await import('foo', {type: 'json'});
  // clobber stuff
  Object.defineProperty(foo, 'bar', {
    get() {
       // fun stuff like arguments.caller / Error.prepareStackTrace / etc. go here
    },
    set(v) {
       // more clobbering/calling here
    }
  });
}

I have serious reservations about the viability of keeping everything in sync across all call sites and if throwing on mismatch is desirable. It seems like throw on mismatch could lead to just iterating all the possible values in order and doing things as if it were some kind of odd overloading technique.

littledan · 2020-01-22T19:25:43Z

@bmeck This is an interesting example. Sounds like we probably shouldn't cache failure due to module type check mismatch.

hax · 2020-01-30T13:46:30Z

So allow retry when type mismatch? But current import('foo') will not allow retry if type is not js mimetype.

jkrems · 2020-03-05T21:13:29Z

In terms of use cases, this points to two things being fundamentally at odds (based on some not-so-theoretical host that acts like a browser):

Module attributes are optional, allowing for incremental adoption of an attribute (assertion-style attributes). This implies that module identity doesn't depend on attributes.
Module attributes change the interpretation of the URL, e.g. by adding or modifying headers or by changing how the fetched resource is evaluated. This implies that module identity does depend on attributes.

It would feel pretty unfortunate if those two semantics would mix. Not only would tools and users have to have perfect knowledge on the cache semantics of each individual attribute (as opposed to module attributes in general), it would also affect the ability to support unknown attributes. A host couldn't just ignore an unknown attribute because it would have ambiguous cache key semantics.

And unless new module attributes can only ever apply to module types that have never been supported before, (1) seems pretty important. Otherwise it would require global coordination across files to make sure that the same target is never requested with different attributes.

P.S.: Global coordination because this is speculating about likely behavior in a browser-like host where there's a global/realm-wide module map that has some caching semantics. From a JS spec perspective it would only be per-importing-file coordination. But I don't think browsers would pick global caching semantics that are incompatible with the per-file ones. So - effectively it's global cache-by-URL that most users would observe.

Jack-Works · 2020-05-11T15:44:16Z

We should split module attributes that will and won't change the identity of the module, it's better on the syntax level.

For example:

These two modules are different

import mod from '/mod' with { type: 'css' }
import mod1 from '/mod' with { }

These two modules are the same

import mod from '/mod' with {} and { integrity: 'hash-of-this-file' }
//                          ~~     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
import mod1 from '/mod' with {} and { cache: '1year' }

In the first attribute object {}, any difference in it cause them to be two different modules, but in the second attribute object, module with different attribute object are treated as the same module.

But this is not enough, we need a syntax for the developer to specify the following orthogonal behavior:

Identity: If this attribute changes, should it be the same module? (For type, no. For integrity, yes)
Ignorable: Is the attribute ignorable if the runtime doesn't know it's semantics?
...: any more?

littledan · 2020-05-11T15:49:05Z

@Jack-Works I see type as being in the same category as integrity: they are both checks. type always checks a type which is present in some other way, e.g., the MIME type or file extension. (That said, there are some significant arguments against having integrity in particular.) These would not be part of the cache key (failure vs non-failure would differ based on what's passed, but they couldn't have multiple distinct success values.)

There's another category which would change the interpretation of a module. For example, parameters to JSON.parse, or interpreting the path as a string that's default-exported rather than what it would be otherwise. These sorts of module attributes would need to be part of the cache key.

Jack-Works · 2020-05-11T15:55:16Z

@littledan I don't think type is a simple check. The browser can send different sec-fetch-dest header based on the value the type therefore server can return different formats based on the sec-fetch-dest header.
Therefore, files with different type must be treated as different modules. But if I import a module with integrity and import it again without integrity, I expect it to be the same module

littledan · 2020-05-11T15:58:38Z

@Jack-Works Let's discuss this further in #24 . We'll need a specification for what browsers do exactly before Stage 3, and web specs like fetch tend to go into that level of detail.

Jack-Works · 2020-05-11T16:02:54Z

@littledan oh, so type is just my example, I really want to emphasize is that we should have a signal to tell the runtime if the attribute is ignorable or will affect the identity of the module even they don't know what the attribute is.

jkrems · 2020-05-11T16:02:59Z

I like that separation! But the and feels awkward because it favors cache-busting attributes over ones that maintain a single module per URL. Maybe something like this would express the same:

// Attributes that only assert properties on the resource but don't affect how it's fetched:
import mod from '/mod' assert { integrity: 'hash-of-this-file' }
import mod from '/mod' expect { integrity: 'hash-of-this-file' }
import mod from '/mod' assert { type: 'json' }

// Attributes that may affected how the module is being fetched:
import mod from '/mod' with { credentials: 'include' }

// Combined
import mod from '/mod' with { credentials: 'include' } assert { integrity: 'hash-of-this-file' }

Therefore, files with different type must be treated as different modules.

I don't think that's necessarily true. One big downside is that type could never be optional which is a bit unfortunate. Having a non-optional type attribute means there's a big pressure to only ever load modules from JavaScript modules (to make things easier to use / less noisy). So even endpoints that could've been simple data files would be returning executable scripts instead.

littledan · 2020-05-11T17:07:09Z

I continue to have the understanding that type would be required in the web, and non-web environments could decide whether to require it or make it optional.

hax · 2020-05-11T17:55:22Z

I continue to have the understanding that type would be required in the web, and non-web environments could decide whether to require it or make it optional.

So it would make the modules env-specific just because json/wasm modules?

littledan · 2020-05-12T09:41:54Z

@hax This proposal does not prohibit environments from making various kinds of environment-specific modules, including JSON modules which don't require with type: "json". Environments may choose to either stick to the cross-environment set or support more forms.

hax · 2020-05-12T11:26:11Z

@littledan I think u mean each env could choose whether with type is enforced, but I really hope we could keep them uniform to avoid fragmentation of the ecosystem. 😥

Jack-Works · 2020-05-12T13:24:23Z

Let the developer decide if their attribute must be enforced. By this way, we can avoid fragmentation of the ecosystem

littledan · 2020-05-12T13:50:42Z

@hax I hope so too. That's why this proposal requires that with type: "json" is supported. But, as the semantics of modules in general is unspecified, I don't see how we could prohibit them from making other choices for that general case.

hax · 2020-05-13T07:48:43Z

Let the developer decide if their attribute must be enforced.

How ? The reason why we need with type is browsers think json module will be insecure without that. (But I'm still not convinced there is no other way to solve that security problem :-)

hax · 2020-05-13T07:59:10Z

I don't see how we could prohibit them from making other choices for that general case.

@littledan I think it's the problem, we face such problems and up to now all possible solution I saw are just add another layer of complexity for developers. The original requirement as I understand is just JSON module, why developers need to pay all other complexity for simple loading JSON problem, I think most will just choose continue using let data = await fetchJSON('./x.json').

(I understand there are also wasm/css/html modules may need it, but first wasm module do not suffer similar security issue, css/html module are still in a long way and it's too early to say whether they will eventually available)

littledan · 2020-05-13T10:43:39Z

@Jack-Works I don't see how we could let the developer decide, except in the context of a Compartment API which defines module semantics. Outside of Compartments, the environment provides module semantics.

@hax Developers who do that won't be able to use JSON modules on the Web. So I'm not convinced that most developers will choose this, given that many JS developers write code to run on the web.

littledan · 2020-05-28T08:20:29Z

Based on the feedback in openjs-foundation/standards#91 , I think there's some more to discuss with respect to the details of caching and treatment of unrecognized and host-defined attributes. The current spec draft provides a concrete starting point which various people have raised concerns about being too permissive for hosts; before Stage 3, it seems like we may want to put more restrictions on hosts to ensure expected behavior.

littledan · 2020-06-05T16:32:22Z

In the end, we decided to land #66 for Stage 2, which restricts module attributes to not be part of the cache key. Does that resolves the concerns raised in this issue?

xtuc · 2020-06-13T20:35:15Z

I think we can close this issue now. We can change the caching semantics based on the follow-up proposals.

littledan · 2020-06-22T18:00:31Z

@domenic has raised questions about the caching model in whatwg/html#5658 (comment) and whatwg/html#5658 (comment) . Although we have TC39 consensus recorded on going to Stage 2 with a restriction to "check"-style attributes, the plan was to investigate host interactions between Stage 2 and 3, and revise host hook invariants based on that investigation, so this conversation seems in scope.

As I wrote about the current status,

To clarify, the import attributes proposal isn't about what you're importing but whether the import meets certain conditions. For this reason, we switched the token introducing this form from with to if and are considering renaming the proposal to import conditions. Modifying what is being imported (which would logically be part of the cache key) would be a separate proposal, if sufficient use cases are identified. We're working on clarifying this distinction in our documentation and hope to present on it in the upcoming TC39 meeting.

I'm not sure whether this meets technical requirements from HTML with respect to module cache keying. So, I'm reopening this thread to discuss further.

nicolo-ribaudo · 2023-03-14T17:27:44Z

The proposal has been updated so that import assertions/attributes can affect how a module is loaded, in response to feedback from HTML.

As such, the cache is now keyed by (referrer, specifier, attributes) and the recommendation that it should only be keyed by (referrer, specifier) has been removed.

Jack-Works mentioned this issue May 12, 2020

Bikeshed possible syntaxes for import statements #6

Closed

littledan added the proposed resolution: hosts to decide label May 20, 2020

littledan added this to the stage 3 milestone May 28, 2020

littledan removed the proposed resolution: hosts to decide label Jun 5, 2020

littledan modified the milestones: stage 3, stage 2 Jun 5, 2020

xtuc closed this as completed Jun 13, 2020

littledan mentioned this issue Jun 22, 2020

Reland JSON module scripts whatwg/html#5658

Merged

3 tasks

littledan reopened this Jun 22, 2020

xtuc modified the milestones: stage 2, stage 3 Jun 23, 2020

nicolo-ribaudo closed this as completed Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Module cache keying semantics #30

Module cache keying semantics #30

guybedford commented Nov 26, 2019

devsnek commented Nov 26, 2019

littledan commented Nov 26, 2019

bmeck commented Dec 12, 2019

littledan commented Jan 22, 2020

hax commented Jan 30, 2020

jkrems commented Mar 5, 2020 •

edited

Loading

Jack-Works commented May 11, 2020

littledan commented May 11, 2020

Jack-Works commented May 11, 2020

littledan commented May 11, 2020

Jack-Works commented May 11, 2020

jkrems commented May 11, 2020 •

edited

Loading

littledan commented May 11, 2020

hax commented May 11, 2020

littledan commented May 12, 2020

hax commented May 12, 2020

Jack-Works commented May 12, 2020

littledan commented May 12, 2020

hax commented May 13, 2020

hax commented May 13, 2020 •

edited

Loading

littledan commented May 13, 2020

littledan commented May 28, 2020

littledan commented Jun 5, 2020

xtuc commented Jun 13, 2020

littledan commented Jun 22, 2020

nicolo-ribaudo commented Mar 14, 2023

Module cache keying semantics #30

Module cache keying semantics #30

Comments

guybedford commented Nov 26, 2019

devsnek commented Nov 26, 2019

littledan commented Nov 26, 2019

bmeck commented Dec 12, 2019

littledan commented Jan 22, 2020

hax commented Jan 30, 2020

jkrems commented Mar 5, 2020 • edited Loading

Jack-Works commented May 11, 2020

littledan commented May 11, 2020

Jack-Works commented May 11, 2020

littledan commented May 11, 2020

Jack-Works commented May 11, 2020

jkrems commented May 11, 2020 • edited Loading

littledan commented May 11, 2020

hax commented May 11, 2020

littledan commented May 12, 2020

hax commented May 12, 2020

Jack-Works commented May 12, 2020

littledan commented May 12, 2020

hax commented May 13, 2020

hax commented May 13, 2020 • edited Loading

littledan commented May 13, 2020

littledan commented May 28, 2020

littledan commented Jun 5, 2020

xtuc commented Jun 13, 2020

littledan commented Jun 22, 2020

nicolo-ribaudo commented Mar 14, 2023

jkrems commented Mar 5, 2020 •

edited

Loading

jkrems commented May 11, 2020 •

edited

Loading

hax commented May 13, 2020 •

edited

Loading