Skip to content
This repository has been archived by the owner on Jan 25, 2022. It is now read-only.

Various spec fixes #13

Merged
merged 5 commits into from
Feb 7, 2018
Merged

Various spec fixes #13

merged 5 commits into from
Feb 7, 2018

Conversation

anba
Copy link
Contributor

@anba anba commented Feb 6, 2018

I've tried to implement the Intl.Locale proposal at https://bugzilla.mozilla.org/show_bug.cgi?id=1433303, but some parts of the current spec seem to incomplete/buggy, so here's a stab at fixing some of the issues I've found.

637d9c9
General fixes (missing "then", typos, etc.)

aedbbbc
Adds calls to GetOption, so we don't need to perform manual type conversions in some places. (Basically to follow the in-house style of ECMA-402 😄 )

dd9f44b
The current approach doesn't quite work some edge-cases (attributes in Unicode extension sequences, private-use subtags, irregular grandfathered language tags, etc.). And ResolveLocale doesn't really fit for Intl.Locale, does it? For example I assume new Intl.Locale("en", {nu: "latn"}) should return "en-u-nu-latn", right? (Using ResolveLocale it'd return "en", because extension keys supplied as options remove extension keys from the Unicode extension sequence.)

a222fb3
Restricts the tag parameter for Intl.Locale to strings and objects, and also directly accesses [[Locale]] for Intl.Locale objects. Using normal ToString conversion doesn't really make sense for language tags (see this test case) and I think it makes sense to align Intl.Locale with CanonicalizeLocaleList .

48e47b7
Restores (was partially removed in dd9f44b) and fixes some bugs (undefined record fields were accessed) in the optional support for kf and kn. (TBH I'm not really sure we want to copy the Intl.Collator restriction over to Intl.Locale, but this should be decided in a different issue.)

anba added 5 commits February 6, 2018 07:35
- Pass options to ApplyOptionsToTag
- Move ToString into constructor to ensure ToString(tag) appears before ToObject(options)
- Add definition for opt record
- Add missing underscores, fix typos, etc.
Change Unicode extension handling, so it works even when Unicode extension attributes
or privateuse subtags are present or the language tag is an irregular grandfathered tag.
The removed algorithms also contained several invalid calls (e.g.
FindExtension/UnicodeExtensionValue was called with a string which didn't form a proper
Unicode extension sequence) and had other issues.
`new Intl.Locale().toString() === "undefined"` or `new Intl.Locale(NaN).toString() === "nan"` should not be allowed.
@anba anba mentioned this pull request Feb 6, 2018
@anba
Copy link
Contributor Author

anba commented Feb 6, 2018

The patch also removes duplicate unused Unicode extension keys. While I didn't intend this change, maybe it's actually the right choice (see #14).

@zbraniecki
Copy link
Member

One concern I have here, is that it seems to me like you're trying to validate the extension keys. I believe that this is not a role of this API. I'd prefer it to be agnostic when it comes to keys and values for extension keys.

As long as they're well-formed according to BCP47 (extension = singleton 1*("-" (2*8alphanum))), I don't believe we should strip or remove anything from the language tag.

The only change comes from the fact that some of those extension keys will take internal slots, and can be overridden by options.
In such case, we should produce the result value with those new keys, and if they're overridden, with the same key with a new value.

I'm ok with sorting.

@anba
Copy link
Contributor Author

anba commented Feb 7, 2018

One concern I have here, is that it seems to me like you're trying to validate the extension keys. I believe that this is not a role of this API. I'd prefer it to be agnostic when it comes to keys and values for extension keys.

Hmm, I don't follow. The new ApplyUnicodeExtensionToTag doesn't perform any validation on the Unicode extension keywords and other extension sequences are left unchanged. Can you point out which steps in which algorithms you mean?
It's totally possible that there are still a few bugs in the proposed patch and some algorithms don't actually work as I intended them to work. But updating https://bugzilla.mozilla.org/show_bug.cgi?id=1433303 should help to uncover any remaining issues, at least that's my hope! 😄

Copy link
Member

@littledan littledan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the validation concern. It seems like this patch does exactly what the old spec was trying to do, except correctly instead of incorrectly (modulo what looks like a couple small mistakes for indentation level). Excellent work as always, @anba !

1. Let _value_ be the string-concatenation of _value_ and _subtag_.
1. Let _k_ be _k_ + _len_ + 1.
1. If _keywords_ does not contain an element whose [[Key]] is the same as _key_, then
1. Append the Record{[[Key]]: _key_, [[Value]]: _value_} to _keywords_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the previous two lines be indented one level further?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's at the correct level, the last element is added to keywords after the loop. (Elements are added to keywords when a new Unicode extension key was found or the end of the Unicode extension sequence has been reached.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see.

1. If _value_ is not the empty String, then
1. Let _newExtension_ be the string-concatenation of _newExtension_, `"-"`, and _value_.
1. Let _locale_ be the String value that is _tag_ with all Unicode locale extension sequences removed.
1. If the number of elements in _newExtension_ is greater than 2, then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any cases where there is a Unicode extension sequence but nothing gets appended to newExtension? If not, I think the preceding section would be a bit clearer if this conditional were hoisted up to line 64 (namely, maybe the else case should skip straight down to line 126).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(namely, maybe the else case should skip straight down to line 126).

Do you mean the Else at line 66? If we go from 66 directly 126 (well, actually 128), we'd skip the loop at line 95, which we need to run even if no Unicode extension sequences are present, because it appends Unicode extension keywords from the options into the result locale.

Lines 68-88 could be moved into a separate abstract operation, so it can also be used in ResolveLocale and replace the existing UnicodeExtensionValue operation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, never mind.

1. Let _postExtension_ be the substring of _locale_ from position _privateIndex_ to the end of the string.
1. Let _locale_ be the string-concatenation of _preExtension_, _newExtension_, and _postExtension_.
1. Assert: ! IsStructurallyValidLanguageTag(_locale_) is *true*.
1. Let _locale_ be ! CanonicalizeLanguageTag(_locale_).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the previous two lines intended to be indented, or should they also run for locales without a Unicode extension key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this part is directly copied from ResolveLocale, step 2. For the final inclusion into the spec we probably want to create a separate abstract operation which then can be shared by ApplyUnicodeExtensionToTag and ResolveLocale.

Copy link
Member

@littledan littledan Feb 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean Step 9?

OK, I see that it's the same as what's there. I don't quite understand what it does--do we canonicalize things without a Unicode extension?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The canonicalization step is just there to ensure the Unicode extension sequence is at the correct canonical position.

If we have "en-y-not-registrd-ext-but-possible" and then simply append, let's say "-u-ca-gregory", we'd get "en-y-not-registrd-ext-but-possible-u-ca-gregory". But that's not a canonical language tag, because the "y" singleton appears before the "u" singleton. So we need to canonicalize it again and then we get the expected result "en-u-ca-gregory-y-not-registrd-ext-but-possible".

And yes, this case is a bit ridiculous. 😄

@littledan
Copy link
Member

littledan commented Feb 7, 2018

(TBH I'm not really sure we want to copy the Intl.Collator restriction over to Intl.Locale, but this should be decided in a different issue.)

tbh I'd rather just remove the optionality of supporting those parameters entirely. See discussion in tc39/ecma402#113

@littledan littledan merged commit 457367d into tc39:master Feb 7, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants