-
Notifications
You must be signed in to change notification settings - Fork 17
Conversation
- Pass options to ApplyOptionsToTag - Move ToString into constructor to ensure ToString(tag) appears before ToObject(options) - Add definition for opt record - Add missing underscores, fix typos, etc.
Change Unicode extension handling, so it works even when Unicode extension attributes or privateuse subtags are present or the language tag is an irregular grandfathered tag. The removed algorithms also contained several invalid calls (e.g. FindExtension/UnicodeExtensionValue was called with a string which didn't form a proper Unicode extension sequence) and had other issues.
`new Intl.Locale().toString() === "undefined"` or `new Intl.Locale(NaN).toString() === "nan"` should not be allowed.
The patch also removes duplicate unused Unicode extension keys. While I didn't intend this change, maybe it's actually the right choice (see #14). |
One concern I have here, is that it seems to me like you're trying to validate the extension keys. I believe that this is not a role of this API. I'd prefer it to be agnostic when it comes to keys and values for extension keys. As long as they're well-formed according to BCP47 ( The only change comes from the fact that some of those extension keys will take internal slots, and can be overridden by options. I'm ok with sorting. |
Hmm, I don't follow. The new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the validation concern. It seems like this patch does exactly what the old spec was trying to do, except correctly instead of incorrectly (modulo what looks like a couple small mistakes for indentation level). Excellent work as always, @anba !
1. Let _value_ be the string-concatenation of _value_ and _subtag_. | ||
1. Let _k_ be _k_ + _len_ + 1. | ||
1. If _keywords_ does not contain an element whose [[Key]] is the same as _key_, then | ||
1. Append the Record{[[Key]]: _key_, [[Value]]: _value_} to _keywords_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the previous two lines be indented one level further?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's at the correct level, the last element is added to keywords
after the loop. (Elements are added to keywords
when a new Unicode extension key was found or the end of the Unicode extension sequence has been reached.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see.
1. If _value_ is not the empty String, then | ||
1. Let _newExtension_ be the string-concatenation of _newExtension_, `"-"`, and _value_. | ||
1. Let _locale_ be the String value that is _tag_ with all Unicode locale extension sequences removed. | ||
1. If the number of elements in _newExtension_ is greater than 2, then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any cases where there is a Unicode extension sequence but nothing gets appended to newExtension? If not, I think the preceding section would be a bit clearer if this conditional were hoisted up to line 64 (namely, maybe the else case should skip straight down to line 126).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(namely, maybe the else case should skip straight down to line 126).
Do you mean the Else
at line 66? If we go from 66 directly 126 (well, actually 128), we'd skip the loop at line 95, which we need to run even if no Unicode extension sequences are present, because it appends Unicode extension keywords from the options
into the result locale.
Lines 68-88 could be moved into a separate abstract operation, so it can also be used in ResolveLocale
and replace the existing UnicodeExtensionValue
operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, never mind.
1. Let _postExtension_ be the substring of _locale_ from position _privateIndex_ to the end of the string. | ||
1. Let _locale_ be the string-concatenation of _preExtension_, _newExtension_, and _postExtension_. | ||
1. Assert: ! IsStructurallyValidLanguageTag(_locale_) is *true*. | ||
1. Let _locale_ be ! CanonicalizeLanguageTag(_locale_). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the previous two lines intended to be indented, or should they also run for locales without a Unicode extension key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this part is directly copied from ResolveLocale, step 2. For the final inclusion into the spec we probably want to create a separate abstract operation which then can be shared by ApplyUnicodeExtensionToTag
and ResolveLocale
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean Step 9?
OK, I see that it's the same as what's there. I don't quite understand what it does--do we canonicalize things without a Unicode extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The canonicalization step is just there to ensure the Unicode extension sequence is at the correct canonical position.
If we have "en-y-not-registrd-ext-but-possible" and then simply append, let's say "-u-ca-gregory", we'd get "en-y-not-registrd-ext-but-possible-u-ca-gregory". But that's not a canonical language tag, because the "y" singleton appears before the "u" singleton. So we need to canonicalize it again and then we get the expected result "en-u-ca-gregory-y-not-registrd-ext-but-possible".
And yes, this case is a bit ridiculous. 😄
tbh I'd rather just remove the optionality of supporting those parameters entirely. See discussion in tc39/ecma402#113 |
I've tried to implement the
Intl.Locale
proposal at https://bugzilla.mozilla.org/show_bug.cgi?id=1433303, but some parts of the current spec seem to incomplete/buggy, so here's a stab at fixing some of the issues I've found.637d9c9
General fixes (missing "then", typos, etc.)
aedbbbc
Adds calls to
GetOption
, so we don't need to perform manual type conversions in some places. (Basically to follow the in-house style of ECMA-402 😄 )dd9f44b
The current approach doesn't quite work some edge-cases (attributes in Unicode extension sequences, private-use subtags, irregular grandfathered language tags, etc.). And
ResolveLocale
doesn't really fit forIntl.Locale
, does it? For example I assumenew Intl.Locale("en", {nu: "latn"})
should return"en-u-nu-latn"
, right? (UsingResolveLocale
it'd return"en"
, because extension keys supplied as options remove extension keys from the Unicode extension sequence.)a222fb3
Restricts the
tag
parameter forIntl.Locale
to strings and objects, and also directly accesses [[Locale]] forIntl.Locale
objects. Using normal ToString conversion doesn't really make sense for language tags (see this test case) and I think it makes sense to alignIntl.Locale
withCanonicalizeLocaleList
.48e47b7
Restores (was partially removed in dd9f44b) and fixes some bugs (undefined record fields were accessed) in the optional support for kf and kn. (TBH I'm not really sure we want to copy the
Intl.Collator
restriction over toIntl.Locale
, but this should be decided in a different issue.)