Skip to content
This repository has been archived by the owner on Jan 25, 2022. It is now read-only.

Validation of 'calendar', 'collation', and 'numberingSystem' options #19

Closed
anba opened this issue Feb 9, 2018 · 9 comments
Closed

Comments

@anba
Copy link
Contributor

anba commented Feb 9, 2018

The 'calendar', 'collation', and 'numberingSystem' options need to be validated, otherwise we could end up with invalid language tags, for example new Intl.Locale("en", {numberingSystem: "!@-#asdf"}).toString() should not return en-u-nu-!@-#asdf.

Three possible choices:

  • Perform complete validation, similar to the existing validation for 'hourCycle' and 'caseFirst':
  1. Let calendar be ? GetOption(options, "calendar", "string", undefined, undefined).
  2. If calendar is not undefined, then
    1. If calendar is not the name of a calendar type in Unicode Technical Standard 35, throw a RangeError.
  3. Set opt.[[ca]] to calendar.
  • Only validate the input matches (3*8alphanum) *("-" (3*8alphanum)) (in RFC 5646's ABNF) resp. the type production per UTS35.
  1. Let calendar be ? GetOption(options, "calendar", "string", undefined, undefined).
  2. If calendar is not undefined, then
    1. If calendar does not match the [(3*8alphanum) *("-" (3*8alphanum))] sequence, throw a RangeError exception.
  3. Set opt.[[ca]] to calendar.
- Or change the (currently incorrect) assertion in ApplyUnicodeExtensionToTag:
1. Assert: ! IsStructurallyValidLanguageTag(locale) is true.
1. If ! IsStructurallyValidLanguageTag(locale) is false, throw a RangeError.

(See Edit 1 and Edit 2 below.)

Edit 1:
The third proposal (calling IsStructurallyValidLanguageTag in ApplyUnicodeExtensionToTag for validation) is probably not the right choice, because it may let new Intl.Locale("en", {numberingSystem: "latn-ca-gregory"}) slip through.

Edit 2:
Yup, just tested that this'll be a non-starter:

andre@VBdev:~/hg/mozilla-inbound/js/src/build-debug-opt-obj$ dist/bin/js                             
js> addIntlExtras(Intl)
js> new Intl.Locale("de",{numberingSystem:"latn-ca-gregory"}).toString()
"de-u-nu-latn-ca-gregory"
js> new Intl.Locale("de",{numberingSystem:"latn-ca-gregory"}).numberingSystem
"latn-ca-gregory"
@zbraniecki
Copy link
Member

BCP47 separated "valid" and "well-formed". I'd like us to support "well-formed" here.

That means that your example of "en-u-nu-!@-#asdf" will not work because it doesn't match:

extension = singleton 1*("-" (2*8alphanum))

but, en-u-nu-foo should parse and I believe in result new Intl.Locale('en', {'numberingSystem': 'foo'}) should as well.

@anba
Copy link
Contributor Author

anba commented Feb 9, 2018

new Intl.Locale("en", {hourCycle: "h99"}) currently throws a RangeError, should this also be changed?

@zbraniecki
Copy link
Member

I believe so. @littledan @srl295 ?

@zbraniecki
Copy link
Member

My intention here is that we should not cap the values that are well-formed and potentially supported by some language tag parsing code just because our APIs don't support a certain value.

For example, maybe at some point UTS will add hc-h28 and maybe ECMA242 will not support it for a while. That doesn't mean that Intl.Locale must reject such code.
A user may use Intl.Locale to compare/retrieve language or script portion of the tag, but keep carrying such a language tag around because some part of their code actually recognizes h28 as a valid hourCycle.
So, yeah, I'd suggest we stick to well-formed for this API.

@anba
Copy link
Contributor Author

anba commented Feb 9, 2018

(Note: Updated the issue description to note a possible problem with the third choice.)

@littledan
Copy link
Member

I'm not sure if ICU contains an API for this validation, cc @jungshik . The question here is closely related to that in tc39/ecma402#175 (comment) though we may decide to answer it differently if Intl formatters are validating and Intl.Locale is not.

With the patch in Intl.Locale currently as well as tc39/ecma402#175 , the logic for validating options is something like, "if it's from a small, fixed set, check that it is in that set; otherwise, treat it as uninterpreted data and permit any value". I can see how this is inconsistent, especially if hourCycle may get more values in the future.

For that reason, option 2 seems good to me. But, if we go with that, maybe we want to perform similar validation in existing Intl constructors. Or should we continue to omit the validation for compatibility?

Seems like we should discuss this issue in the Intl VC call.

@anba
Copy link
Contributor Author

anba commented Feb 10, 2018

I'm not sure if ICU contains an API for this validation

The following functions should probably fit our needs:

  • unumsys_openAvailableNames to retrieve all known numbering systems.
  • ucol_getKeywordValues to retrieve all known collation types.
  • And for the calendar types we probably need to use ucal_getKeywordValuesForLocale with commonlyUsed = false.

Plus uloc_toUnicodeLocaleType to convert the values to BCP 47 Unicode extension types.

@littledan
Copy link
Member

We discussed this issue in the Intl meeting, and the group consensus was to validate just the grammar, as @anba's patch ended up doing.

@littledan
Copy link
Member

Fixed by #23.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants