-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Order dependence for equality #15
Comments
I got some other negative feedback on the order. Basically it's there to be "mindful" of the engines so we have something that remotely looks like the hidden class implementation optimization. But I think that last consideration could be brought up later... I'm gonna make an another pass on the proposal text at some point soon I'll ping that issue when I do! |
Another interesting consideration is that I expect |
If order matters then yes, and I agree it'll generate a lot of confusion. I was trying to be mindful of the engine before the user, it'll get updated! |
I think it would be better if instead for |
Sets, Maps, and objects (roughly) all follow insertion order for enumeration - that seems the precedent to follow. Insertion order isn’t likely to matter when comparing two Sets in the Set method proposal - so I’d assume that’s the precedent to follow for comparison too. |
If we did have There's a related (but inverse) example in the language already where For this case it'd be the opposite way, two objects could be |
To clarify we currently have these rules that hold true:
But making |
Order dependence can be removed if you're willing to allow some extra complexity that ranges from |
@ljharb, you can't both have insertion order for iteration and make insertion order not matter for equality comparison. I'm with @Jamesernator - alphabetic order, possibly with nonnegative integer keys first, is the way to go. (The set methods proposal isn't adding an "equals" method, so I'm not sure what you're referring to there, but in any case that would be a very different thing from two things being |
@bakkot You can't avoid being able to distinguish them as long as you can iterate them - even if the iteration order is undefined, you'll still be able to tell many
As for the stated semantics, it's intended to express non-coercing equality, not necessarily complete indistinguishability. Checking that is the goal for Edit: Filed #20 to ask about this. |
@isiahmeadows
Only if the iteration order is undefined, which it should not be. If it is defined purely in terms of which keys are present, then iterating would not allow you to distinguish them.
I'm familiar with equality semantics in JS, but "
Technically this is not forbidden by the spec, but the spec should not be built on the assumption that engines are going to do so.
The fact that |
I think the way this is going to end up is the following:
This might change or e reviewed later on as we get closer to actual implementation (if that proposal ever goes that far) |
I feel that many of the listed goals for value types will not actually be upheld if that is the case. A lot of the stated goals I feel fundamentally rely on the idea that:
However, if A and B can |
That's a good point: I removed all ordering considerations for now from the proposal and we can keep discussing in that issue |
We decided to have a stable key ordering: @rickbutton is working on it now! |
speaking of! #25 |
What is the sort order used for property keys? Cc @syg |
I believe that should be outlined here |
@ljharb The explainer says that record fields are sorted, and it sounds like this is not done by the order they are present in the literal. |
I'm still worried that with a sorted order, I can envision how it can be made fast with different object representations, but the current hidden class / shape "linear history of property insertion" representation is pretty pervasive. Speculative optimizations are often checked for validity with a single shape pointer, for example. It is attractive to have these const values things to like extensible record types from other languages, but that's a big departure from objects. |
Without sort order we'll have a worse problem that prevents adoption: the comparison may be fast but it isn't useful (or at least, isn't any more useful than current objects). Per my example above, if you can't rely on f(a) === f(b) if a === b, then the utility of these objects decreases significantly. You can no longer rely on early return optimizations when a === b, so it becomes unclear what the benefit of these objects really is outside of "engine optimization" (as opposed to algorithm optimization that the users can employ). That is to say, maybe using immutable objects gets you "under the hood" speedups, but they can't really be safely reasoned about from the outside as being any safer than normal objects. In fact, a frozen mutable object is arguably "safer" in this regard than these immutable objects, because if frozen(a) === frozen(b), then (as far as I know) you can safely early exit, while you wouldn't with these immutable objects. |
I hear you. I also certainly don't advocate for a world where |
I'm hopeful that engines would be able to implement this with just a tweak to how hidden class transitions work: the new hidden class can remain a function of the current hidden class and the new key, it would just be a different function.
But this is definitely a question for implementors. |
@bakkot Agreed that the value checking isn't likely to matter here. It also might just be a tweak that works out cleanly, that'd be great. To be concrete, it'd be nice if an initial implementation in current engines didn't incur a bunch of perf cliffs. For example, I'm worried of JIT code being invalidated due to "insertions" when creating new immutable objects, because their hidden class has a different transition than just "append". |
My apologies and I do not mean to be argumentative, but I think our position vis a vis implementors should be that this is a deal breaker -- in other words, I think this feature isn't really worth it without sorted keys. My reasoning is that without sorted keys, the "best practices" around using immutable objects becomes incredibly sophisticated and subtle. In fact, I believe that "immutable objects" becomes one big expert-level gotcha. As a framework writer, I know that I have to essentially ignore their existence with user-supplied objects for the reasons listed above, but as a library user, even for something as simple as a memoizer, I have to hope that the library author is very aware of these issues. It's already the case that many memoizers fail with existing gotchas (for example, many erroneously believe they already have memoized results for "hasOwnProperty" or other keys that exist on Object's prototype), this would add yet another layer of submarine bugs. The simplest proof-of-concept case of course is just memoize(Object.keys). Although this memoized function does not have a tremendous amount of utility, it does highlight the difficulty of implementing such a memoizer and the contradictory nature of the situation: if insertion order matters, then the memoizer should return the insertion-order listing of the keys -- but it will very likely fail to do so for indistinguishable "different" objects. As I mentioned earlier, without sorted key I'd actually recommend frozen objects to users instead, where the expectations are more straight-forward -- it is not a "content addressable" object, it is merely a non-changeable object, where === works exactly the same as in all other contexts (that is to say, two "structurally identical", but different, objects are always !==). I don't mean this as a cop-out, but rather saying that we have an existing feature that is easier to understand (and more consistent in my opinion) for what this proposal would provide without sorted keys. |
Why? Why would Object.is returning |
It's because |
Is that intuition or a spec requirement? My intuition is that getOwnPropertySymbols should return all the symbols the object/record has in some order, I would not expect to rely on the order of said symbols. If the spec requires getOwnPropertySymbols to return in some particular order for objects - I understand the objection. |
Also, if it's a requirement - can't the spec well.. require it? Can't we say "Symbol keys on records are required to always be iterated in the same implementation defined order such that if Object.is(recordA, recordB) is true then Object.getOwnPropertySymbols(recordA) contains the same elements as Object.getOwnPropertySymbols(recordB) and in the same order" (except in spec language)? |
If some functions treat two objects differently, those objects are not the same, and so should not be
The spec requires getOwnPropertySymbols to return symbols in the order in which they were added to the object.
In general JS has been moving away from leaving things implementation-defined, since in practice people tend to end up relying on a particular implementation and then all engines are forced to go with that strategy forever. We'd want to specify a particular order. But which order? The only obvious one is creation order, which, as mentioned above, is a side channel. |
Thanks for bearing with me :] To be clear - I am not objecting one way or another I am just trying to understand. Intuitively it sounds like an unexpected limitation.
I'm not sure that I understand why that is a side channel or why it is any more of a side-channel than with objects?
For objects, records don't have to define the same order as objects I think?
It sounds reasonable when presented this way. |
My understanding is that it means that passing to another party the two symbols But I'll let @erights speak to that further; I don't want to misrepresent his position.
Yeah, didn't mean to imply otherwise. In fact strings in records will probably be in lexicographical rather than (as for objects) chronological order. But we do need to pick some order. |
It's a bit of information, indeed. Personally, I'm not quite convinced that it's significant/important enough to constitute a meaningful information leak/security issue. I'm happy to ban Symbols as Record keys now, or even commit to that in the future, but I don't understand the scenario where this "leak" would lead to a bad outcome when composing programs. |
Maybe, the following. But I doubt if it really harmful. // code in realm A
Symbol.for(secret_password) // code in realm B which is running bad code
function guess_password() {
const symb = Symbol.for(Math.random().toString())
for (const guess of possibilities) {
for (const key in #{ [symb]: 0, [Symbol.for(guess)]: 0 }) {
if (key !== symb) {
// which means the symbol @@"guess" is used
// before the symb created.
return key // same value of secret_password
} else {
// which means the symbol @@guess is created
// after the symb created.
// since the order of symbol requires a global
// shared registry, it means this round of
// guess is wrong
}
continue
}
}
} |
I am not sure that's really a side-channel since the global symbol registry is well... global (it's whole goal is to leak this sort of info across realms isn't it?). |
IIRC now you cannot know the order of symbols creations. If symbol can be generally sorted, this can leaks some information |
Even revealing the relative order in which two symbols were created is a concern, because it would help an attacker fingerprint the code they’re attacking. Any unintentionally revealed information of any kind is a risk. |
It sounds like:
I honestly feel that language cohesiveness (objects and records allowing for different types of keys) was given up a bit too quickly. It's just a feeling and not criticism as I am pretty ignorant of the process that happened here and thankful for the discussion. I think a reasonable path forward could be to write the potential side-channel attack clearly documenting what extra information is exposed across realms and why that is not desirable. Then maybe figure out if solutions #2 (spec and require object creation order or some other consistent order) or #3 (require the invariant without specifying the order) are viable. |
The leak isn't just in cross-realm symbols, to be clear; it's just as much a problem within a single realm.
This isn't a solution to the side channel; it's the cause of it. The root problem is that, assuming keys for a given record are iterated in the same order each time, there must be an order across all symbols in order to preserve the fundamental property that if It would actually be straightforward to specify this order; we'd just say something along the lines of "in the order in which they were created", or have a global counter which the symbol constructor used to add an internal slot holding an ID into each symbol, or something. We already say "in ascending order of property creation time" for properties in OrdinaryOwnPropertyKeys.
It's not just that engines could vary: I think that we should not require engines do a thing unless we have a good idea how they could do it in a way which would satisfy the constraints we care about. If we know how to specify an order over all symbols which wouldn't have this leak, we should just require that order explicitly; if we don't, I don't think we can reasonably require engines to come up with one on their own. (Though I am also in favor of fully specifying things, as a rule.) |
For the sake of completeness since we are rehashing this, I believe there is another option, which is to only allow Symbols that have unique names. That is to say, similarly to how you can't have { "x": 1, "x": 2 } (the second overwrites the first), in records, if ToString(symbol1) === ToString(symbol2), then either symbol2 overwrites symbol1 or it throws an error. This would allow all the built-in symbols to always work (since IIRC they are the only ones where ToString resolves to something without quotes |
@tolmasky it's come up multiple times; anything that's not "allow no Symbols" or "allow all Symbols" will not be able to reach consensus. Also, |
What if we say insertion order matters towards equality iff two symbols have the same ToString(). That is to say, the ordering of the symbols is lexicographical first, insertion order second (with the added convenience that "mutation" methods keep the original record's insertion order, so I'm not sure if we're aligned on goals, but I see the main goals as being:
I feel like the lexicographical ordering, falling back to insertion order, and maintaining "originating" object insertion order seems to satisfy the best balance of the above 3 goals. It satisfies goal 1, 3, and 4(?) completely, at the expense of not having goal 2 be true as often as if we either disallow symbols completely, or allow them with a global counter to provide "identical ordering". |
What would the third thing be? A sorting algorithm always has to have a tiebreaker, in this case when the Symbol descriptions are identical. In particular, it's important that any records or tuples for which If you maintain "original" insertion order, then you can potentially detect if a Symbol has been used in a Record or Tuple before, which is an information leak (violates goal 4, which is a nonstarter) |
Perhaps I didn't describe the algorithm clearly enough -- the insertion order is the tie breaker. You don't need a third tie breaker since they can't be inserted at the same time (unless I am missing something). So, the comparison is: function compare(recordInQuestion, symbol1, symbol2)
{
const comparison = ToString(symbol1).localeCompare(ToString(symbol2));
if (comparison !== 0)
return comparison;
const insertionOrder1 = InsertionOrder(symbol1, recordInQuestion);
const insertionOrder2 = InsertionOrder(symbol2, recordInQuestion);
// Can't be 0.
return insertionOrder1 < insertionOrder2 ? -1 : 1;
} The change I am saying to make is that insertion order matters when determining whether A few examples: const SymbolA = Symbol();
const SymbolB = Symbol();
#{ [SymbolA]: 1, [SymbolB]: 2 } === #{ [SymbolA]: 1, [SymbolB]: 2 } // trivially equal (although name clashes, the insertion order is the same)
#{ [SymbolA]: 1, [Symbol.iterator]: f } === #{ [Symbol.iterator]: f, [SymbolA]: 1 } // equal since we never have to rely on order for non-name-clashing symbols
#{ [SymbolA]: 1, [SymbolB]: 2 , [Symbol.iterator]: f } === #{ [Symbol.iterator]: f, [SymbolA]: 1, [SymbolB]: 2 } // same as above
#{ [SymbolA]: 1, [SymbolB]: 2, [Symbol.iterator]: f } === #{ [SymbolA]: 1 , [Symbol.iterator]: f, [SymbolB]: 2 } // again, SymbolA and SymbolB in both instances are ordered the same
#{ [SymbolB]: 1, [SymbolA]: 2 } !== #{ [SymbolA]: 1, [SymbolB]: 2 } // not equal because we have to fall back to insertion order when sorting name-clash symbols
#{ [SymbolA]: 1, [SymbolB]: 2, [Symbol.iterator]: f } !== #{ [Symbol.iterator]: f, [SymbolB]: 2, [SymbolA]: 1 } // same as above
(#{ [SymbolA]: 1, [SymbolB]: 2 } with .[SymbolA] = 1) === #{ [SymbolA]: 1, [SymbolB]: 2 } // still equal because `with` maintains sort order of original record So, if two objects have the same symbols, each with a unique name, then the ordering of insertion doesn't matter and the two objects are always equal. If however, the objects contain multiple symbols with the same name, then the insertion order of those symbols matters for equality. |
Hmm, so:
It would to me like (if I understand this correctly) @tolmasky 's suggestion addresses the symbol key limitation concerns no? |
Just a slight correction here (although either could work), the tie breaker isn't creation order but rather insertion order (similar to how in objects the order is entirely based on insertion order). The reasoning for this is that it avoids needing a global counter at all, and in theory provides no leak whatsoever with regard to symbol creation times. |
If the tiebreaker is insertion order, then we risk allowing tuples to be unequal when we'd hope that they would be equal. If the tiebreaker is creation order, then we risk leaking information through a membrane since membranes can't intercept primitives, and symbols are primitives. (I don't see ToString as really functioning as a helpful sort order, given that it leaves us with ties.) |
@littledan - yes this was the conceit I put upfront: if we are willing to allow the rare cases of records that:
to not be equal, then that resolves this issue. There is no "fundamental" issue or inconsistency here, just a performance degradation for certain edge cases, since it's up to us to define what contributes to equality. This also solves the issue of iterating the symbols in a consistent way even without the comparison issue. The reasoning here is that it's very likely that records of the same "shape" are coming from the same factory function, so it would be a rare occurrence in an immutable record feature to be comparing records of two completely different creation sources that have almost identical shapes except for the fact that two functions that created them chose to order their identically named symbols differently. And again, this failure state is not even necessarily that bad in my opinion. For example, in the case of memoization, a !== b can of course still result in f(a) === f(b), without caching. We're basically falling into the new String("a") !== new String("b") case with this, but again, triggered by a very unlikely set of events (which arguably people are already guarding against due to the old v8 advice of always putting your keys in the same order to trigger "performance"). Furthermore, I think this to a large degree is "essentially" correct in that it demonstrates the bounds of equality: Symbol() !== Symbol() due to nominal equality in javascript. In essence, we are trying to compute the |
Sorry if this is already refuted, but an idea I have is that the spec mandates the engine should create an random number (in the same way as Math.random) for each new symbol. The symbol keys are then sorted according to this random number. Also, I really don't think suggesting to the developers that two pieces of code running in the same VM have any sort of "secure isolation" in between is wise. After all, there are already tons of ways that "bad code" can do damage to/leaf information from the shared environment. (For example, by changing |
Not TC39, but the chances of you getting that through are extremely slim. Just FYI. |
Is there anything fundamentally flawed with this approach? I think it is terrible idea to disallow different features of a language from working together nicely and consistently, especially only for nitpicking reasons like preventing leaking information to other code running in the same VM by observing ordering of keys in a generally unordered data structure. |
@SCLeoX objects and arrays, and records and tuples, are all ordered data structures. It’s important that the same code produce the same ordering in different environments. |
The fundamental thing is non-determinism - they want to keep close tabs on
that and limit it to what they can. (In certain contexts like SES, it's
essential to be able to block out or at least constrain non-determinism.)
On Mon, Jun 14, 2021 at 03:03 Rin Tepis ***@***.***> wrote:
Sorry if this is already refuted, but an idea I have is that the spec
mandates the engine should create an random number (in the same way as
Math.random) for each new symbol. The symbol keys are then sorted according
to this random number.
Not TC39, but the chances of you getting that through are extremely slim.
Just FYI.
Is there anything fundamentally flawed with this approach? I think it is
terrible idea to disallow different features of a language from working
together nicely and consistently, especially only for nitpicking reasons
like preventing leaking information to other code running in the same VM by
observing ordering of keys in a generally unordered data structure.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABCGWBBOD2FKLM2ULHF64ZTTSW6E7ANCNFSM4HU6BDUQ>
.
--
-----
Claudia Meadows
***@***.***
|
@ljharb @isiahmeadows Sigh, I really don't think it is a strong reason. Truth be told, yes, determinism is nice to have, but it really isn't that important. After all, JavaScript currently is by no means deterministic (Math.random, new Date, WeakRef, etc.). Other than that, none of the non-academic production language I know is deterministic nor wants to be deterministic. Deterministic execution, in my opinion, if really needed, should be a flag to the engine. When enabled, the engine will use the same seed for Math.random and symbol ordering generation (what I suggested) and disable Date related APIs. (Sigh again) If deterministic execution is really what TC39 is after, I really hope they realize what they are trading with is literally the consistency and the symmetry of the language. |
From the readme:
This is surprising to me. Contrast python dicts or Purescript records.
It's also just a weird path-dependence in values: I would not expect it to matter if I construct something by doing
(const { a: 0 }) with .b = 1
or(const { b: 1 }) with .a = 0
The text was updated successfully, but these errors were encountered: