Various spec fixes #13

anba · 2018-02-06T16:00:19Z

I've tried to implement the Intl.Locale proposal at https://bugzilla.mozilla.org/show_bug.cgi?id=1433303, but some parts of the current spec seem to incomplete/buggy, so here's a stab at fixing some of the issues I've found.

637d9c9
General fixes (missing "then", typos, etc.)

aedbbbc
Adds calls to GetOption, so we don't need to perform manual type conversions in some places. (Basically to follow the in-house style of ECMA-402 😄 )

dd9f44b
The current approach doesn't quite work some edge-cases (attributes in Unicode extension sequences, private-use subtags, irregular grandfathered language tags, etc.). And ResolveLocale doesn't really fit for Intl.Locale, does it? For example I assume new Intl.Locale("en", {nu: "latn"}) should return "en-u-nu-latn", right? (Using ResolveLocale it'd return "en", because extension keys supplied as options remove extension keys from the Unicode extension sequence.)

a222fb3
Restricts the tag parameter for Intl.Locale to strings and objects, and also directly accesses [[Locale]] for Intl.Locale objects. Using normal ToString conversion doesn't really make sense for language tags (see this test case) and I think it makes sense to align Intl.Locale with CanonicalizeLocaleList .

48e47b7
Restores (was partially removed in dd9f44b) and fixes some bugs (undefined record fields were accessed) in the optional support for kf and kn. (TBH I'm not really sure we want to copy the Intl.Collator restriction over to Intl.Locale, but this should be decided in a different issue.)

- Pass options to ApplyOptionsToTag - Move ToString into constructor to ensure ToString(tag) appears before ToObject(options) - Add definition for opt record - Add missing underscores, fix typos, etc.

Change Unicode extension handling, so it works even when Unicode extension attributes or privateuse subtags are present or the language tag is an irregular grandfathered tag. The removed algorithms also contained several invalid calls (e.g. FindExtension/UnicodeExtensionValue was called with a string which didn't form a proper Unicode extension sequence) and had other issues.

`new Intl.Locale().toString() === "undefined"` or `new Intl.Locale(NaN).toString() === "nan"` should not be allowed.

anba · 2018-02-06T18:02:55Z

The patch also removes duplicate unused Unicode extension keys. While I didn't intend this change, maybe it's actually the right choice (see #14).

zbraniecki · 2018-02-06T23:16:06Z

One concern I have here, is that it seems to me like you're trying to validate the extension keys. I believe that this is not a role of this API. I'd prefer it to be agnostic when it comes to keys and values for extension keys.

As long as they're well-formed according to BCP47 (extension = singleton 1*("-" (2*8alphanum))), I don't believe we should strip or remove anything from the language tag.

The only change comes from the fact that some of those extension keys will take internal slots, and can be overridden by options.
In such case, we should produce the result value with those new keys, and if they're overridden, with the same key with a new value.

I'm ok with sorting.

anba · 2018-02-07T14:00:43Z

One concern I have here, is that it seems to me like you're trying to validate the extension keys. I believe that this is not a role of this API. I'd prefer it to be agnostic when it comes to keys and values for extension keys.

Hmm, I don't follow. The new ApplyUnicodeExtensionToTag doesn't perform any validation on the Unicode extension keywords and other extension sequences are left unchanged. Can you point out which steps in which algorithms you mean?
It's totally possible that there are still a few bugs in the proposed patch and some algorithms don't actually work as I intended them to work. But updating https://bugzilla.mozilla.org/show_bug.cgi?id=1433303 should help to uncover any remaining issues, at least that's my hope! 😄

littledan

I don't understand the validation concern. It seems like this patch does exactly what the old spec was trying to do, except correctly instead of incorrectly (modulo what looks like a couple small mistakes for indentation level). Excellent work as always, @anba !

littledan · 2018-02-07T22:01:26Z

spec.html

+            1. Let _value_ be the string-concatenation of _value_ and _subtag_.
+          1. Let _k_ be _k_ + _len_ + 1.
+        1. If _keywords_ does not contain an element whose [[Key]] is the same as _key_, then
+          1. Append the Record{[[Key]]: _key_, [[Value]]: _value_} to _keywords_.


Should the previous two lines be indented one level further?

No, it's at the correct level, the last element is added to keywords after the loop. (Elements are added to keywords when a new Unicode extension key was found or the end of the Unicode extension sequence has been reached.)

littledan · 2018-02-07T22:11:59Z

spec.html

+          1. If _value_ is not the empty String, then
+            1. Let _newExtension_ be the string-concatenation of _newExtension_, `"-"`, and _value_.
+        1. Let _locale_ be the String value that is _tag_ with all Unicode locale extension sequences removed.
+        1. If the number of elements in _newExtension_ is greater than 2, then


Are there any cases where there is a Unicode extension sequence but nothing gets appended to newExtension? If not, I think the preceding section would be a bit clearer if this conditional were hoisted up to line 64 (namely, maybe the else case should skip straight down to line 126).

(namely, maybe the else case should skip straight down to line 126).

Do you mean the Else at line 66? If we go from 66 directly 126 (well, actually 128), we'd skip the loop at line 95, which we need to run even if no Unicode extension sequences are present, because it appends Unicode extension keywords from the options into the result locale.

Lines 68-88 could be moved into a separate abstract operation, so it can also be used in ResolveLocale and replace the existing UnicodeExtensionValue operation.

I see, never mind.

littledan · 2018-02-07T22:13:03Z

spec.html

+            1. Let _postExtension_ be the substring of _locale_ from position _privateIndex_ to the end of the string.
+            1. Let _locale_ be the string-concatenation of _preExtension_, _newExtension_, and _postExtension_.
+          1. Assert: ! IsStructurallyValidLanguageTag(_locale_) is *true*.
+          1. Let _locale_ be ! CanonicalizeLanguageTag(_locale_).


Are the previous two lines intended to be indented, or should they also run for locales without a Unicode extension key?

No, this part is directly copied from ResolveLocale, step 2. For the final inclusion into the spec we probably want to create a separate abstract operation which then can be shared by ApplyUnicodeExtensionToTag and ResolveLocale.

Do you mean Step 9?

OK, I see that it's the same as what's there. I don't quite understand what it does--do we canonicalize things without a Unicode extension?

The canonicalization step is just there to ensure the Unicode extension sequence is at the correct canonical position.

If we have "en-y-not-registrd-ext-but-possible" and then simply append, let's say "-u-ca-gregory", we'd get "en-y-not-registrd-ext-but-possible-u-ca-gregory". But that's not a canonical language tag, because the "y" singleton appears before the "u" singleton. So we need to canonicalize it again and then we get the expected result "en-u-ca-gregory-y-not-registrd-ext-but-possible".

And yes, this case is a bit ridiculous. 😄

littledan · 2018-02-07T22:26:51Z

(TBH I'm not really sure we want to copy the Intl.Collator restriction over to Intl.Locale, but this should be decided in a different issue.)

tbh I'd rather just remove the optionality of supporting those parameters entirely. See discussion in tc39/ecma402#113

anba added 5 commits February 6, 2018 07:35

General spec fixes

637d9c9

- Pass options to ApplyOptionsToTag - Move ToString into constructor to ensure ToString(tag) appears before ToObject(options) - Add definition for opt record - Add missing underscores, fix typos, etc.

Use GetOption abstract operation to retrieve option values

aedbbbc

Align Intl.Locale tag argument handling with CanonicalizeLocaleList

a222fb3

`new Intl.Locale().toString() === "undefined"` or `new Intl.Locale(NaN).toString() === "nan"` should not be allowed.

Restore optional support for kf and kn

48e47b7

anba mentioned this pull request Feb 6, 2018

Spec questions/errors #12

Closed

littledan approved these changes Feb 7, 2018

View reviewed changes

littledan mentioned this pull request Feb 7, 2018

Consider performing complete Unicode extension canonicalization per RFC6067 #14

Closed

littledan merged commit 457367d into tc39:master Feb 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various spec fixes #13

Various spec fixes #13

anba commented Feb 6, 2018

anba commented Feb 6, 2018

zbraniecki commented Feb 6, 2018

anba commented Feb 7, 2018

littledan left a comment

littledan Feb 7, 2018

anba Feb 7, 2018

littledan Feb 7, 2018

littledan Feb 7, 2018

anba Feb 7, 2018

littledan Feb 7, 2018

littledan Feb 7, 2018

anba Feb 7, 2018

littledan Feb 7, 2018 •

edited

Loading

anba Feb 7, 2018

littledan commented Feb 7, 2018 •

edited

Loading

Various spec fixes #13

Various spec fixes #13

Conversation

anba commented Feb 6, 2018

anba commented Feb 6, 2018

zbraniecki commented Feb 6, 2018

anba commented Feb 7, 2018

littledan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

littledan Feb 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

littledan commented Feb 7, 2018 • edited Loading

littledan Feb 7, 2018 •

edited

Loading

littledan commented Feb 7, 2018 •

edited

Loading